Skip to main content

Showing 1–50 of 552 results for author: Wang, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17877  [pdf, other

    eess.SY

    Equity-aware Load Shedding Optimization

    Authors: Xin Fang, Wenbo Wang, Fei Ding

    Abstract: Load shedding is usually the last resort to balance generation and demand to maintain stable operation of the electric grid after major disturbances. Current load-shedding optimization practices focus mainly on the physical optimality of the network power flow. This might lead to an uneven allocation of load curtailment, disadvantaging some loads more than others. Addressing this oversight, this p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Contact email for corresponding and first author: [email protected]

  2. arXiv:2406.17800  [pdf, other

    q-bio.QM cs.SD eess.AS

    Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

    Authors: Meng Cui, Xubo Liu, Haohe Liu, **zheng Zhao, Daoliang Li, Wenwu Wang

    Abstract: Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.16381  [pdf, other

    eess.SP

    Polar-Coded Tensor-Based Unsourced Random Access with Soft Decoding

    Authors: Jiaqi Fang, Yan Liang, Gangle Sun, Hongwei Hou, Yafei Wang, Li You, Wen** Wang

    Abstract: The unsourced random access (URA) has emerged as a viable scheme for supporting the massive machine-type communications (mMTC) in the sixth generation (6G) wireless networks. Notably, the tensor-based URA (TURA), with its inherent tensor structure, stands out by simultaneously enhancing performance and reducing computational complexity for the multi-user separation, especially in mMTC networks wit… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  4. arXiv:2406.16058  [pdf, other

    eess.AS

    Text-Queried Target Sound Event Localization

    Authors: **zheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

    Abstract: Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  5. arXiv:2406.14875  [pdf, other

    cs.SD eess.AS

    GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024, 4 pages, 3 figures

  6. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  7. arXiv:2406.10365  [pdf, other

    eess.SY math.OC

    Multi-Objective Control Co-design Using Graph-Based Optimization for Offshore Wind Farm Grid Integration

    Authors: Himanshu Sharma, Wei Wang, Bowen Huang, Thiagarajan Ramachandran, Veronica Adetola

    Abstract: Offshore wind farms have emerged as a popular renewable energy source that can generate substantial electric power with a low environmental impact. However, integrating these farms into the grid poses significant complexities. To address these issues, optimal-sized energy storage can provide potential solutions and help improve the reliability, efficiency, and flexibility of the grid. Nevertheless… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.09053  [pdf, ps, other

    eess.SP

    Joint Channel Estimation and Prediction for Massive MIMO with Frequency Hop** Sounding

    Authors: Yiming Zhu, Jiawei Zhuang, Gangle Sun, Hongwei Hou, Li You, Wen** Wang

    Abstract: In massive multiple-input multiple-output (MIMO) systems, the downlink transmission performance heavily relies on accurate channel state information (CSI). Constrained by the transmitted power, user equipment always transmits sounding reference signals (SRSs) to the base station through frequency hop**, which will be leveraged to estimate uplink CSI and subsequently predict downlink CSI. This pa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  9. arXiv:2406.09022  [pdf, other

    eess.SP

    Towards Unified AI Models for MU-MIMO Communications: A Tensor Equivariance Framework

    Authors: Yafei Wang, Hongwei Hou, ** Yi, Wen** Wang, Shi **

    Abstract: In this paper, we propose a unified framework based on equivariance for the design of artificial intelligence (AI)-assisted technologies in multi-user multiple-input-multiple-output (MU-MIMO) systems. We first provide definitions of multidimensional equivariance, high-order equivariance, and multidimensional invariance (referred to collectively as tensor equivariance). On this basis, by investigat… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  11. arXiv:2406.05914  [pdf, other

    eess.AS cs.SD eess.SP

    Soundscape Captioning using Sound Affective Quality Network and Large Language Model

    Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

    Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

  12. arXiv:2406.02921  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Text Injection for Neural Contextual Biasing

    Authors: Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran

    Abstract: Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and it… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

    Journal ref: Interspeech 2024, Kos Island, Greece

  13. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  14. arXiv:2406.00604  [pdf, other

    eess.SP

    Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems

    Authors: Shoushuo Zhang, Zichao Xiao, Rang Liu, Ming Li, Wei Wang, Qian Liu

    Abstract: Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctu… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE WCL

  15. arXiv:2406.00503  [pdf, other

    math.OC cs.LG eess.SY math-ph stat.ML

    Schrödinger Bridge with Quadratic State Cost is Exactly Solvable

    Authors: Alexis M. H. Teter, Wenqing Wang, Abhishek Halder

    Abstract: Schrödinger bridge is a diffusion process that steers a given distribution to another in a prescribed time while minimizing the effort to do so. It can be seen as the stochastic dynamical version of the optimal mass transport, and has growing applications in generative diffusion models and stochastic optimal control. In this work, we propose a regularized variant of the Schrödinger bridge with a q… ▽ More

    Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  16. arXiv:2406.00444  [pdf, other

    eess.SP

    Exploring Channel Estimation and Signal Detection for ODDM-based ISAC Systems

    Authors: Dezhi Wang, Chongwen Huang, Lei Liu, Xiaoming Chen, Wei Wang, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Inspired by providing reliable communications for high-mobility scenarios, in this letter, we investigate the channel estimation and signal detection in integrated sensing and communication~(ISAC) systems based on the orthogonal delay-Doppler multiplexing~(ODDM) modulation, which consists of a pulse-train that can achieve the orthogonality with respect to the resolution of the delay-Doppler~(DD) p… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted by IEEE Wireless Communications Letters

  17. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  18. arXiv:2405.16257  [pdf, other

    eess.SP eess.SY

    From Single to Multi-Functional RIS: Architecture, Key Technologies, Challenges, and Applications

    Authors: Wanli Ni, Ailing Zheng, Wen Wang, Dusit Niyato, Naofal Al-Dhahir, Merouane Debbah

    Abstract: Although reconfigurable intelligent surfaces (RISs) have demonstrated the potential to boost network capacity and expand coverage by adjusting their electromagnetic properties, existing RIS architectures have certain limitations, such as double-fading attenuation and restricted half-space coverage. In this article, we delve into the progressive development from single to multi-functional RIS (MF-R… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures, submitted to IEEE magazines

  19. arXiv:2405.11895  [pdf, other

    cs.LG eess.SY

    Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins

    Authors: Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang

    Abstract: In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  20. arXiv:2405.10705  [pdf, other

    eess.IV cs.CV

    3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

    Authors: Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wen** Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

    Abstract: Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substanti… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 13 figures, 5 tables

  21. arXiv:2405.08169  [pdf, other

    eess.IV cs.CV

    Rethinking Histology Slide Digitization Workflows for Low-Resource Settings

    Authors: Talat Zehra, Joseph Marino, Wendy Wang, Grigoriy Frantsuzov, Saad Nadeem

    Abstract: Histology slide digitization is becoming essential for telepathology (remote consultation), knowledge sharing (education), and using the state-of-the-art artificial intelligence algorithms (augmented/automated end-to-end clinical workflows). However, the cumulative costs of digital multi-slide high-speed brightfield scanners, cloud/on-premises storage, and personnel (IT and technicians) make the c… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: MICCAI 2024 Early Accept. First four authors contributed equally

  22. arXiv:2405.07685  [pdf, other

    eess.SY

    Comprehensive Analysis of Access Control Models in Edge Computing: Challenges, Solutions, and Future Directions

    Authors: Tao Xue, Ying Zhang, Yanbin Wang, Wenbo Wang, Shuailou Li, Haibin Zhang

    Abstract: Many contemporary applications, including smart homes and autonomous vehicles, rely on the Internet of Things technology. While cloud computing provides a multitude of valuable services for these applications, it generally imposes constraints on latency-sensitive applications due to the significant propagation delays. As a complementary technique to cloud computing, edge computing situates computi… ▽ More

    Submitted 22 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  23. arXiv:2405.07443  [pdf, other

    eess.SY

    Minimum-Variance Recursive State Estimation for 2-D Systems: When Asynchronous Multi-Channel Delays meet Energy Harvesting Constraints

    Authors: Yu Chen, Wei Wang

    Abstract: This paper is concerned with the state estimation problem for two-dimensional systems with asynchronous multichannel delays and energy harvesting constraints. In the system, each smart sensor has a certain probability of harvesting energy from the external environment, the authorized transmission between the sensor and the remote filter is contingent upon the current energy level of the sensor, wh… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  24. arXiv:2405.04865  [pdf, ps, other

    cs.LG eess.SP

    Regime Learning for Differentiable Particle Filters

    Authors: John-Joseph Brady, Yuhui Luo, Wenwu Wang, Victor Elvira, Yunpeng Li

    Abstract: Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultan… ▽ More

    Submitted 12 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    MSC Class: 68T37 ACM Class: I.2.6

  25. arXiv:2405.00637  [pdf, ps, other

    eess.SY

    A Distributed Model Identification Algorithm for Multi-Agent Systems

    Authors: Vivek Khatana, Chin-Yao Chang, Wenbo Wang

    Abstract: In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantage… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  26. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  27. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  28. arXiv:2404.18094  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been e… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 15 pages, 13 figures. Copyright has been transferred to IEEE

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024

  29. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  30. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  31. arXiv:2404.16408  [pdf, other

    cs.IT eess.SY

    Event-Triggered Resilient Filtering for 2-D Systems with Asynchronous-Delay: Handling Binary Encoding Decoding with Probabilistic Bit Flips

    Authors: Yu Chen, Wei Wang

    Abstract: In this paper, the event-triggered resilient filtering problem is investigated for a class of two-dimensional systems with asynchronous-delay under binary encoding-decoding schemes with probabilistic bit flips. To reduce unnecessary communications and computations in complex network systems, alleviate network energy consumption, and optimize the use of network resources, a new event-triggered mech… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  32. arXiv:2404.16324  [pdf, other

    math.NA cs.LG eess.SP

    Improved impedance inversion by deep learning and iterated graph Laplacian

    Authors: Davide Bianchi, Florian Bossmann, Wenlong Wang, Mingming Liu

    Abstract: Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Lap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Report number: submitted to SEG Geophysics (June 2024)

  33. arXiv:2404.15311  [pdf, other

    eess.SP cs.AI cs.LG

    Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

    Authors: Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu

    Abstract: The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of develo** robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted HCI International 2024

  34. arXiv:2404.13372  [pdf, other

    eess.IV cs.CV

    HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

    Authors: Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang

    Abstract: This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  35. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  36. arXiv:2404.10180  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

    Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

    Abstract: Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of… ▽ More

    Submitted 23 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

  37. arXiv:2404.09226  [pdf, other

    eess.IV cs.CV cs.LG

    Breast Cancer Image Classification Method Based on Deep Transfer Learning

    Authors: Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li

    Abstract: To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  38. arXiv:2404.08713  [pdf, other

    eess.IV cs.LG q-bio.QM

    Survival Prediction Across Diverse Cancer Types Using Neural Networks

    Authors: Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

    Abstract: Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  39. arXiv:2404.08279  [pdf, other

    eess.IV cs.CV cs.LG

    Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

    Authors: MingXuan Xiao, Yufeng Li, Xu Yan, Min Gao, Weimin Wang

    Abstract: Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  40. arXiv:2404.06076  [pdf, other

    eess.IV

    Image and Video Compression using Generative Sparse Representation with Fidelity Controls

    Authors: Wei Jiang, Wei Wang

    Abstract: We propose a framework for learned image and video compression using the generative sparse visual representation (SVR) guided by fidelity-preserving controls. By embedding inputs into a discrete latent space spanned by learned visual codebooks, SVR-based compression transmits integer codeword indices, which is efficient and cross-platform robust. However, high-quality (HQ) reconstruction in the de… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    MSC Class: 68 ACM Class: I.4.2; I.4.4

  41. arXiv:2403.19127  [pdf, ps, other

    eess.SP cs.IT

    Decentralizing Coherent Joint Transmission Precoding via Fast ADMM with Deterministic Equivalents

    Authors: Xinyu Bian, Yuhao Liu, Yizhou Xu, Tianqi Hou, Wenjie Wang, Yuyi Mao, Jun Zhang

    Abstract: Inter-cell interference (ICI) suppression is critical for multi-cell multi-user networks. In this paper, we investigate advanced precoding techniques for coordinated multi-point (CoMP) with downlink coherent joint transmission, an effective approach for ICI suppression. Different from the centralized precoding schemes that require frequent information exchange among the cooperating base stations,… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  42. arXiv:2403.15468  [pdf, other

    eess.SP

    Human Detection in Realistic Through-the-Wall Environments using Raw Radar ADC Data and Parametric Neural Networks

    Authors: Wei Wang, Naike Du, Yuchao Guo, Chao Sun, **gyang Liu, Rencheng Song, Xiuzhu Ye

    Abstract: The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 11pages,13figures

  43. arXiv:2403.14180  [pdf, ps, other

    eess.SP

    Adaptive Target Detection for FDA-MIMO Radar with Training Data in Gaussian noise

    Authors: ** Li, Bang Huang, Wen-Qin Wang

    Abstract: This paper addresses the problem of detecting a moving target embedded in Gaussian noise with an unknown covariance matrix for frequency diverse array multiple-input multiple-output (FDA-MIMO) radar. To end it, assume that obtaining a set of training data is available. Moreover, we propose three adaptive detectors in accordance with the one-step generalized likelihood ratio test (GLRT), two-step G… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  44. arXiv:2403.11845  [pdf

    eess.SP

    Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

    Authors: Wei Wang, Dongdong Zou, Zhenpeng Wu, Qi Sui, Xingwen Yi, Fan Li, Chao Lu, Zhaohui Li

    Abstract: Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  45. arXiv:2403.11092  [pdf, other

    cs.CL cs.AI cs.CV cs.CY eess.IV

    Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

    Authors: Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

    Abstract: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and co… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  46. arXiv:2403.09958  [pdf, other

    eess.SP cs.IT

    Decentralizing Coherent Joint Transmission Precoding via Deterministic Equivalents

    Authors: Yuhao Liu, Xinyu Bian, Yizhou Xu, Tianqi Hou, Wenjie Wang, Yuyi Mao, Jun Zhang

    Abstract: In order to control the inter-cell interference for a multi-cell multi-user multiple-input multiple-output network, we consider the precoder design for coordinated multi-point with downlink coherent joint transmission. To avoid costly information exchange among the cooperating base stations in a centralized precoding scheme, we propose a decentralized one by considering the power minimization prob… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  47. arXiv:2403.09527  [pdf, other

    eess.AS

    WavCraft: Audio Editing and Generation with Large Language Models

    Authors: **hua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

    Abstract: We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural language and prompts the LLM conditioned on audio descriptions and user requests. WavCraft leverages the in-context learning ability of the LLM to decompo… ▽ More

    Submitted 10 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  48. arXiv:2403.08236  [pdf, other

    cs.CV eess.IV

    Point Cloud Compression via Constrained Optimal Transport

    Authors: Zezeng Li, Weimin Wang, Ziliang Wang, Na Lei

    Abstract: This paper presents a novel point cloud compression method COT-PCC by formulating the task as a constrained optimal transport (COT) problem. COT-PCC takes the bitrate of compressed features as an extra constraint of optimal transport (OT) which learns the distribution transformation between original and reconstructed points. Specifically, the formulated COT is implemented with a generative adversa… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  49. Integrated Communications and Localization for Massive MIMO LEO Satellite Systems

    Authors: Li You, Xiaoyu Qiang, Yongxiang Zhu, Fan Jiang, Christos G. Tsinos, Wen** Wang, Henk Wymeersch, Xiqi Gao, Björn Ottersten

    Abstract: Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 14 pages, 7 figures, to appear in IEEE Transactions on Wireless Communications

  50. arXiv:2403.03756  [pdf, ps, other

    eess.SP

    Maximizing Energy Charging for UAV-assisted MEC Systems with SWIPT

    Authors: Xiaoyan Hu, Pengle Wen, Han Xiao, Wenjie Wang, Kai-Kit Wong

    Abstract: A Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) scheme with simultaneous wireless information and power transfer (SWIPT) is proposed in this paper. Unlike existing MEC-WPT schemes that disregard the downlink period for returning computing results to the ground equipment (GEs), our proposed scheme actively considers and capitalizes on this period. By leveraging the SWIPT techni… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.