Skip to main content

Showing 1–50 of 112 results for author: Zhang, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12426  [pdf, other

    cs.IT eess.SP

    Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

    Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

    Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

  2. arXiv:2406.09190  [pdf, other

    eess.SP

    Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation

    Authors: Zhiqiang Xiao, Xianda Liu, Yong Zeng, J. Andrew Zhang, Shi **, Rui Zhang

    Abstract: Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.05700  [pdf, other

    cs.CV eess.IV

    HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model

    Authors: Hang Fu, Genyun Sun, Yinhe Li, **chang Ren, Aizhu Zhang, Cheng **g, Pedram Ghamisi

    Abstract: Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  4. arXiv:2405.16011  [pdf, ps, other

    eess.SP

    Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models

    Authors: Shuaishuai Guo, Yanhu Wang, Jia Ye, Anbang Zhang, Kun Xu

    Abstract: Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.10553  [pdf, other

    eess.SP

    Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective

    Authors: Zesong Fei, Shuntian Tang, Xinyi Wang, Fanghao Xia, Fan Liu, J. Andrew Zhang

    Abstract: Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures; submitted to IEEE journals for possible publication

  6. arXiv:2404.09149  [pdf, other

    eess.SY cs.NE math.NA

    Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks

    Authors: Bai Yan, Qi Zhao, ** Zhang, J. Andrew Zhang

    Abstract: This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grou** possibilities… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 30 pages

  7. arXiv:2404.05984  [pdf, ps, other

    eess.SP

    Interference Management for Full-Duplex ISAC in B5G/6G Networks: Architectures, Challenges, and Solutions

    Authors: Aimin Tang, Xudong Wang, J. Andrew Zhang

    Abstract: Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Communications Magazine

  8. arXiv:2403.12630  [pdf, other

    eess.AS cs.SD

    Reproducing the Acoustic Velocity Vectors in a Circular Listening Area

    Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

    Abstract: Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas meas… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  9. arXiv:2403.11940  [pdf, other

    cs.LG eess.SY

    Multistep Inverse Is Not All You Need

    Authors: Alexander Levine, Peter Stone, Amy Zhang

    Abstract: In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  10. arXiv:2403.05793  [pdf, ps, other

    eess.SP

    Performance Bounds for Passive Sensing in Asynchronous ISAC Systems -- Appendices

    Authors: **gbo Zhao, Zhaoming Lu, J. Andrew Zhang, Weicai Li, Yifeng Xiong, Zijun Han, Xiangming Wen, Tao Gu

    Abstract: This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the f… ▽ More

    Submitted 29 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 5 pages

  11. arXiv:2402.17533  [pdf, other

    cs.CV eess.IV

    Black-box Adversarial Attacks Against Image Quality Assessment Models

    Authors: Yu Ran, Ao-Xiang Zhang, Mingjie Li, Weixuan Tang, Yuan-Gen Wang

    Abstract: The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the atta… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  12. arXiv:2402.09048  [pdf, other

    eess.SP

    Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

    Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiao**g Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 1 table

  13. arXiv:2401.15183  [pdf, other

    q-bio.BM eess.IV

    Moment-based metrics for molecules computable from cryo-EM images

    Authors: Andy Zhang, Oscar Mickelin, Joe Kileel, Eric J. Verbeke, Nicholas F. Marshall, Marc Aurèle Gilles, Amit Singer

    Abstract: Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 21 Pages, 9 Figures, 2 Algorithms, and 3 Tables

  14. arXiv:2401.09119  [pdf, other

    eess.SP

    Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

    Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 12 figures, journal paper

  15. arXiv:2401.09064  [pdf, other

    cs.IT eess.SP

    Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

    Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures, journal paper

  16. arXiv:2401.03473  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

    Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  17. arXiv:2312.09760  [pdf, other

    eess.AS cs.SD

    U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

    Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

    Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ASRU2023

  18. Densifying MIMO: Channel Modeling, Physical Constraints, and Performance Evaluation for Holographic Communications

    Authors: Y. Liu, M. Zhang, T. Wang, A. Zhang, M. Debbah

    Abstract: As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevert… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 20 figures, accepted by JSAC-SI-ESIT

  19. arXiv:2310.07141  [pdf, ps, other

    cs.IT eess.SP

    Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems

    Authors: Yuankun Tang, Anjie Zhang, Miaowen Wen, Yu Huang, Fei Ji, **ming Wen

    Abstract: Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o… ▽ More

    Submitted 28 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: accepted by IEEE Wireless Communications and Networking Conference (WCNC) 2024

  20. arXiv:2310.05444  [pdf, other

    cs.IT eess.SP

    Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach

    Authors: Zhiqing Wei, **ghui Piao, Xin Yuan, Huici Wu, J. Andrew Zhang, Zhiyong Feng, Lin Wang, ** Zhang

    Abstract: Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified perfor… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  21. arXiv:2310.04657  [pdf, other

    eess.AS cs.SD

    Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

    Authors: Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

    Abstract: The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  22. arXiv:2310.03265  [pdf, other

    cs.NI eess.SP

    Integrated Communication, Sensing, and Computation Framework for 6G Networks

    Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Zhaohui Yang, Xin Yuan, Xinxin He, ** Zhang

    Abstract: In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures, submitted to IEEE VTM

  23. arXiv:2309.13609  [pdf, other

    cs.CV eess.IV

    Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks

    Authors: Ao-Xiang Zhang, Yu Ran, Weixuan Tang, Yuan-Gen Wang

    Abstract: No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

  24. arXiv:2309.10605  [pdf, other

    eess.AS cs.SD

    An Active Noise Control System Based on Soundfield Interpolation Using a Physics-informed Neural Network

    Authors: Yile Angela Zhang, Fei Ma, Thushara Abhayapala, Prasanga Samarasinghe, Amy Bastine

    Abstract: Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  25. arXiv:2309.02888  [pdf, other

    eess.SP

    Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency betwee… ▽ More

    Submitted 28 May, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: under minor revision in IEEE Transactions on Wireless Communications

  26. arXiv:2308.02915  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

    Authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

    Abstract: When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM MM 2023

  27. arXiv:2307.14907  [pdf, other

    eess.IV cs.CV q-bio.QM

    Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples

    Authors: Andrew H. Song, Mane Williams, Drew F. K. Williamson, Guillaume Jaume, Andrew Zhang, Bowen Chen, Robert Serafin, Jonathan T. C. Liu, Alex Baras, Anil V. Parwani, Faisal Mahmood

    Abstract: Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  28. arXiv:2307.11345  [pdf, other

    cs.IT eess.SP

    Sensing Aided Covert Communications: Turning Interference into Allies

    Authors: Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu

    Abstract: In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 13 pages, 12 figures, submitted to IEEE journals for potential publication

  29. arXiv:2307.07200  [pdf, other

    eess.AS

    Reproducing the Acoustic Velocity Vectors in a Spherical Listening Region

    Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

    Abstract: Acoustic velocity vectors (AVVs) are related to the human's perception of sound at low frequencies and are widely used in Ambisonics. This paper proposes a spatial sound field reproduction algorithm called velocity matching, which reproduces the AVVs in the spherical listening region by matching the AVVs' spherical harmonic coefficients. Using the sound field translation formula, the spherical har… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  30. arXiv:2306.10982  [pdf, other

    cs.IT cs.CR cs.LG eess.SP

    Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels

    Authors: Hang Liu, Jia Yan, Ying-Jun Angela Zhang

    Abstract: Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data… ▽ More

    Submitted 25 December, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: This work has been accepted by the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  31. arXiv:2306.09135  [pdf, other

    eess.AS cs.SD

    Time-Domain Wideband Image Source Method for Spherical Microphone Arrays

    Authors: Jiarui Wang, Prasanga Samarasinghe, Thushara Abhayapala, Jihui Aimee Zhang

    Abstract: This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts wh… ▽ More

    Submitted 9 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted for publication in the IEEE 25th International Workshop on Multimedia Signal Processing (IEEE MMSP 2023)

  32. arXiv:2306.04512  [pdf, other

    eess.IV cs.CV physics.med-ph

    Cross-attention learning enables real-time nonuniform rotational distortion correction in OCT

    Authors: Haoran Zhang, Jianlong Yang, **gqian Zhang, Shiqing Zhao, Aili Zhang

    Abstract: Nonuniform rotational distortion (NURD) correction is vital for endoscopic optical coherence tomography (OCT) imaging and its functional extensions, such as angiography and elastography. Current NURD correction methods require time-consuming feature tracking or cross-correlation calculations and thus sacrifice temporal resolution. Here we propose a cross-attention learning method for the NURD corr… ▽ More

    Submitted 5 January, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Journal ref: Biomedical Optics Express 15.1 (2024): 319-335

  33. arXiv:2306.00804  [pdf, other

    cs.SD cs.CL eess.AS

    Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

    Authors: Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

    Abstract: By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual bias… ▽ More

    Submitted 15 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  34. arXiv:2305.17938  [pdf, other

    cs.IT eess.SP

    Complex CNN CSI Enhancer for Integrated Sensing and Communications

    Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Feifei Gao, Xin Yuan, Zhaohui Yang, ** Zhang

    Abstract: In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhance… ▽ More

    Submitted 19 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: 13 pages, 15 figures, submitted to IEEE Journal of Selected Topics in Signal Processing

  35. arXiv:2305.12493  [pdf, other

    eess.AS cs.CL cs.SD

    Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

    Authors: Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

    Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context… ▽ More

    Submitted 12 July, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by interspeech2023

  36. arXiv:2305.11548  [pdf, ps, other

    eess.SP

    Sensing Aided Uplink Transmission in OTFS ISAC with Joint Parameter Association, Channel Estimation and Signal Detection

    Authors: Xi Yang, Hang Li, Qinghua Guo, J. Andrew Zhang, Xiao**g Huang, Zhiqun Cheng

    Abstract: In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts paramet… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  37. arXiv:2304.11057  [pdf, other

    eess.SP

    Vital Sign Monitoring in Dynamic Environment via mmWave Radar and Camera Fusion

    Authors: Yingqi Wang, Zhongqin Wang, J. Andrew Zhang, Haimin Zhang, Min Xu

    Abstract: Contact-free vital sign monitoring, which uses wireless signals for recognizing human vital signs (i.e, breath and heartbeat), is an attractive solution to health and security. However, the subject's body movement and the change in actual environments can result in inaccurate frequency estimation of heartbeat and respiratory. In this paper, we propose a robust mmWave radar and camera fusion system… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  38. arXiv:2303.06341  [pdf, other

    eess.AS

    The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

    Authors: Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

    Abstract: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectivenes… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 2 pages, accepted by ICASSP 2023

  39. arXiv:2303.04696  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    VOLTA: an Environment-Aware Contrastive Cell Representation Learning for Histopathology

    Authors: Ramin Nakhli, Allen Zhang, Hossein Farahani, Amirali Darbandsari, Elahe Shenasa, Sidney Thiessen, Katy Milne, Jessica McAlpine, Brad Nelson, C Blake Gilks, Ali Bashashati

    Abstract: In clinical practice, many diagnosis tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques require labels, providing manual cell annotations is time-consuming due to the large number of cells. In this paper, we propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images using a novel technique that… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  40. Joint Beamforming for RIS-Assisted Integrated Sensing and Communication Systems

    Authors: Yongqing Xu, Yong Li, J. Andrew Zhang, Marco Di Renzo, Tony Q. S. Quek

    Abstract: Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby en… ▽ More

    Submitted 24 January, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 30 pages, 8 figures. This paper has been accepted by IEEE Transactions on Communications

  41. arXiv:2302.13523  [pdf, other

    cs.SD eess.AS

    VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting

    Authors: Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the e… ▽ More

    Submitted 14 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 5 pages. Accepted at ICASSP2023

  42. arXiv:2302.06044  [pdf, other

    eess.SP

    Air-Ground Integrated Sensing and Communications: Opportunities and Challenges

    Authors: Zesong Fei, Xinyi Wang, Nan Wu, **gxuan Huang, J. Andrew Zhang

    Abstract: The air-ground integrated sensing and communications (AG-ISAC) network, which consists of unmanned aerial vehicles (UAVs) and ground terrestrial networks, offers unique capabilities and demands special design techniques. In this article, we provide a review on AG-ISAC, by introducing UAVs as ``relay'' nodes for both communications and sensing to resolve the power and computation constraints on UAV… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 7 pages, 4 figures. To appear in IEEE Communications Magazines

  43. arXiv:2301.11501  [pdf, ps, other

    eess.SP

    Practical Frequency-Hop** MIMO Joint Radar Communications: Design and Experiment

    Authors: Jiangtao Liu, Kai Wu, Tao Su, J. Andrew Zhang

    Abstract: Joint radar and communications (JRC) can realize two radio frequency (RF) functions using one set of resources, greatly saving hardware, energy and spectrum for wireless systems needing both functions. Frequency-hop** (FH) MIMO radar is a popular candidate for JRC, as the achieved communication symbol rate can greatly exceed radar pulse repetition frequency. However, practical transceiver imperf… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: 11 pages; 12 figures

  44. arXiv:2211.11248  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Video Background Music Generation: Dataset, Method and Evaluation

    Authors: Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

    Abstract: Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a comp… ▽ More

    Submitted 4 August, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by ICCV2023

  45. arXiv:2211.04644  [pdf, other

    cs.IT eess.SP

    Kalman Filter-based Sensing in Communication Systems with Clock Asynchronism

    Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Xin Yuan, ** Zhang

    Abstract: In this paper, we propose a novel Kalman Filter (KF)-based uplink (UL) joint communication and sensing (JCAS) scheme, which can significantly reduce the range and location estimation errors due to the clock asynchronism between the base station (BS) and user equipment (UE). Clock asynchronism causes time-varying time offset (TO) and carrier frequency offset (CFO), leading to major challenges in up… ▽ More

    Submitted 19 June, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 13 pages, 16 figures, submitted to IEEE transactions on signal processing, under review

  46. arXiv:2211.04065  [pdf, other

    cs.IT eess.SP

    Downlink and Uplink Cooperative Joint Communication and Sensing

    Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Zhiqing Wei, Xin Yuan, ** Zhang

    Abstract: Downlink (DL) and uplink (UL) joint communication and sensing (JCAS) technologies have been individually studied for realizing sensing using DL and UL communication signals, respectively. Since the spatial environment and JCAS channels in the consecutive DL and UL JCAS time slots are generally unchanged, DL and UL JCAS may be jointly designed to achieve better sensing performance. In this paper, w… ▽ More

    Submitted 26 April, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 14 pages, 10 figures, submitted to IEEE Transactions on Vehicular Technology

  47. arXiv:2211.04064  [pdf, other

    cs.IT eess.SP

    Multiple Signal Classification Based Joint Communication and Sensing System

    Authors: Xu Chen, Zhiyong Feng, Zhiqing Wei, Xin Yuan, ** Zhang, J. Andrew Zhang, Heng Yang

    Abstract: Joint communication and sensing (JCS) has become a promising technology for mobile networks because of its higher spectrum and energy efficiency. Up to now, the prevalent fast Fourier transform (FFT)-based sensing method for mobile JCS networks is on-grid based, and the grid interval determines the resolution. Because the mobile network usually has limited consecutive OFDM symbols in a downlink (D… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 30 pages, 10 figures, major revision to IEEE Transactions on Wireless Communications

  48. arXiv:2211.04062  [pdf, other

    cs.IT eess.SP

    Concurrent Downlink and Uplink Joint Communication and Sensing for 6G Networks

    Authors: Xu Chen, Zhiyong Feng, Zhiqing Wei, J. Andrew Zhang, Xin Yuan, ** Zhang

    Abstract: Joint communication and sensing (JCAS) is a promising technology for 6th Generation (6G) mobile networks, such as intelligent vehicular networks, intelligent manufacturing, and so on. Equipped with two spatially separated antenna arrays, the base station (BS) can perform downlink active JCAS in a mono-static setup. This paper proposes a Concurrent Downlink and Uplink (CDU) JCAS system where the BS… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 5 pages, 5 figures, submitted to IEEE transactions on vehicular technology correspondence

  49. arXiv:2211.03250  [pdf, other

    cs.IT eess.SP

    Uplink Sensing Using CSI Ratio in Perceptive Mobile Networks

    Authors: Zhitong Ni, J. Andrew Zhang, Kai Wu, Ren ** Liu

    Abstract: Uplink sensing in perceptive mobile networks (PMNs), which uses uplink communication signals for sensing the environment around a base station, faces challenging issues of clock asynchronism and the requirement of a line-of-sight (LOS) path between transmitters and receivers. The channel state information (CSI) ratio has been applied to resolve these issues, however, current research on the CSI ra… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  50. arXiv:2211.01585  [pdf, other

    cs.SD eess.AS

    The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

    Authors: Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu

    Abstract: This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC). We first address the necessity of the challenge and then introduce the associated dataset collected from a new-energy vehicle (NEV) covering a variety of cockpit acoustic conditions and linguistic contents. We then describe the track arrangement and the baseline system. Specifically, w… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted by ISCSLP2022