Search | arXiv e-print repository

Module control of network analysis in psychopathology

Authors: Chunyu Pan, Quan Zhang, Yue Zhu, Shengzhou Kong, Juan Liu, Changsheng Zhang, Fei Wang, Xizhe Zhang

Abstract: The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr… ▽ More The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the control relationships between symptoms remain largely unclear. Here, we present a novel systematizing concept, module control, to analyze the control principle of the symptom network at a module level. We introduce Module Control Network (MCN) to identify key modules that regulate the network's behavior. By applying our approach to a multivariate psychological dataset, we discover that non-emotional modules, such as sleep-related and stress-related modules, are the primary controlling modules in the symptom network. Our findings indicate that module control can expose central symptom cluster governing psychopathology network, offering novel insights into the underlying mechanisms of mental disorders and individualized approach to psychological interventions. △ Less

Submitted 30 May, 2024; originally announced July 2024.

arXiv:2406.18054 [pdf, other]

Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation

Authors: Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He

Abstract: The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb… ▽ More The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereby improving the image quality for diagnostic purposes. In this paper, we propose Diffusion-FFPE, a method for FF-to-FFPE histopathological image translation using a pre-trained diffusion model. Specifically, we employ a one-step diffusion model as the generator and fine-tune it with LoRA adapters using adversarial learning objectives. To ensure that the model effectively captures both global structural information and local details, we propose a multi-scale feature fusion (MFF) module. This module utilizes two VAE encoders to extract features of varying image sizes and performs feature fusion before feeding them into the UNet. Furthermore, we utilize a pre-trained vision-language model for histopathology as the backbone for the discriminator to further improve performance We conducted FF-to-FFPE translation experiments on the TCGA-NSCLC datasets, and our method achieved better performance compared to other methods. The code and models are released at https://github.com/QilaiZhang/Diffusion-FFPE. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17976 [pdf, other]

The Role of Electric Grid Research in Addressing Climate Change

Authors: Le Xie, Subir Majumder, Tong Huang, Qian Zhang, ** Chang, David J. Hill, Mohammad Shahidehpour

Abstract: Addressing the urgency of climate change necessitates a coordinated and inclusive effort from all relevant stakeholders. Critical to this effort is the modeling, analysis, control, and integration of technological innovations within the electric energy system, which plays a crucial role in scaling up climate change solutions. This perspective article presents a set of research challenges and oppor… ▽ More Addressing the urgency of climate change necessitates a coordinated and inclusive effort from all relevant stakeholders. Critical to this effort is the modeling, analysis, control, and integration of technological innovations within the electric energy system, which plays a crucial role in scaling up climate change solutions. This perspective article presents a set of research challenges and opportunities in the area of electric power systems that would be crucial in accelerating Gigaton-level decarbonization. Furthermore, it highlights institutional challenges associated with develo** market mechanisms and regulatory architectures, ensuring that incentives are aligned for stakeholders to effectively implement the technological solutions on a large scale. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 17 pages, 2 figures

arXiv:2406.12236 [pdf, other]

Binaural Selective Attention Model for Target Speaker Extraction

Authors: Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

Abstract: The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing. In this paper, we present a binaural time-domain Target Speaker Extraction model based on the Filter-and-Sum Network (FaSNet). Inspired by human selective hearing, our proposed model introduces target speaker embedding into separators using a multi-head… ▽ More The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing. In this paper, we present a binaural time-domain Target Speaker Extraction model based on the Filter-and-Sum Network (FaSNet). Inspired by human selective hearing, our proposed model introduces target speaker embedding into separators using a multi-head attention-based selective attention block. We also compared two binaural interaction approaches -- the cosine similarity of time-domain signals and inter-channel correlation in learned spectral representations. Our experimental results show that our proposed model outperforms monaural configurations and state-of-the-art multi-channel target speaker extraction models, achieving best-in-class performance with 18.52 dB SI-SDR, 19.12 dB SDR, and 3.05 PESQ scores under anechoic two-speaker test configurations. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.11401 [pdf, other]

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer. Our findings first establish that position embedding provides an effective instrument to alleviate the impact of utterance length on Transformer-based speech enhancement. Specifically, we explore four different position embedding schemes to enable length generalization. The results confirm the superiority of relative position embeddings (RPEs) over absolute PE (APEs) in length generalization. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08523 [pdf, other]

A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography

Authors: Weicheng Yan, Qiude Zhang, Yun Wu, Zhaohui Liu, Liang Zhou, Mingyue Ding, Ming Yuchi, Wu Qiu

Abstract: Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in… ▽ More Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in USCT. However, traditional FWI for sound speed image reconstruction suffers from high sensitivity to the initial model caused by its strong non-convex nonlinearity, resulting in poor performance when ultrasound signals are at high frequencies. This limitation significantly restricts the application of FWI in the USCT imaging field. In this paper, we propose an untrained neural network (UNN) that can be integrated into the traditional iteration-based FWI framework as an implicit regularization prior. This integration allows for seamless deployment as a plug-and-play module within existing FWI algorithms or their variants. Notably, the proposed UNN method can be trained in an unsupervised fashion, a vital aspect in medical imaging where ground truth data is often unavailable. Evaluations of the numerical simulation and phantom experiment of the breast demonstrate that the proposed UNN improves the robustness of image reconstruction, reduces image artifacts, and achieves great image contrast. To the best of our knowledge, this study represents the first attempt to propose an implicit UNN for FWI in reconstructing sound speed images for USCT. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02430 [pdf, other]

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01016 [pdf, ps, other]

Sensing, Communication, and Control Co-design for Energy Efficient Satellite-UAV Networks

Authors: Tianhao. Liang, Huahao. Ding, Yuqi. **, Bin. Cao, Tingting. Zhang, Qinyu. Zhang

Abstract: Traditional terrestrial communication infrastructures often fail to collect the timely information from Internet of Thing (IoT) devices in remote areas. To address this challenge, we investigate a Satellite-unmanned aerial vehicles (UAV) integrated Non-terrestrial network (NTN), where the UAV is controlled by remote control center via UAV-to-Satellite connections. To maximize the energy efficiency… ▽ More Traditional terrestrial communication infrastructures often fail to collect the timely information from Internet of Thing (IoT) devices in remote areas. To address this challenge, we investigate a Satellite-unmanned aerial vehicles (UAV) integrated Non-terrestrial network (NTN), where the UAV is controlled by remote control center via UAV-to-Satellite connections. To maximize the energy efficiency (EE) of the UAV, we optimize the UAV trajectory, power allocation, and state sensing strategies, while guaranteing the control stability and communication reliability. This challenging problem is addressed using an efficient algorithm, incorporating a Deep Q-Network (DQN)-based trajectory determination, a closed form of power allocation, and one-dimensional searching for sensing. Numerical simulations are conducted to validate the effectiveness of our approach. The results showcase the data size of collection has a greater impact than transmission power, and reveal the relationship among sensing interval, communication maximum power and control performance. This study provides promising solutions and valuable insights for efficient data collection in remote IoT. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01010 [pdf, ps, other]

Joint Frame Structure, Beamwidth, and Power Allocation for UAV-Aided Localization and Communication

Authors: Tianhao. Liang, Tingting. Zhang, Sheng. Zhou, Wentao. Liu, Dong. Li, Qinyu. Zhang

Abstract: In wireless sensors networks, integrating localization and communications techniques is crucial for efficient spectrum and hardware utilization. In this paper, we present a novel framework of unmanned aerial vehicle (UAV)-aided localization and communication for ground node (GN), where the average spectral efficiency (SE) is used to reveal the intricate relationship among frame structure, channel… ▽ More In wireless sensors networks, integrating localization and communications techniques is crucial for efficient spectrum and hardware utilization. In this paper, we present a novel framework of unmanned aerial vehicle (UAV)-aided localization and communication for ground node (GN), where the average spectral efficiency (SE) is used to reveal the intricate relationship among frame structure, channel estimation error, and localization accuracy. In particular, we first derive the lower bounds for channel estimation error and the three dimensional location prediction error. Leveraging these comprehensive analysis, we formulate a problem to maximize the average SE in UAV-GN communication, where the frame structure, beamwidth and power allocation are jointly optimized. Subsequently, we propose an efficient iterative algorithm to address this non-convex problem with closed-form expressions for beamwidth and power allocation. Numerical results demonstrate that the performance of our proposed method can approach the upper bound with much lower complexity, and achieve over 70\% performance gain compared to non-localization benchmarks. Additionally, the analysis highlights the dominant impacts from the Doppler effect over noise on the average SE. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.15831 [pdf, other]

Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coupling relationship and even leading to conflict decisions. In this paper, we introduce a novel data-driven deep reinforcement learning (DRL) approach, to handle multiple power flow adjustment tasks jointly instead of learning each task from scratch. At the heart of the proposed method is a multi-task attribution map (MAM), which enables the DRL agent to explicitly attribute each transmission interface task to different power system nodes with task-adaptive attention weights. Based on this MAM, the agent can further provide effective strategies to solve the multi-task adjustment problem with a near-optimal operation cost. Simulation results on the IEEE 118-bus system, a realistic 300-bus system in China, and a very large European system with 9241 buses demonstrate that the proposed method significantly improves the performance compared with several baseline methods, and exhibits high interpretability with the learnable MAM. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by IEEE Transactions on Power Systems

arXiv:2405.12609 [pdf, other]

Mamba in Speech: Towards an Alternative to Self-Attention

Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing using two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which focuses primarily on sequential patterns. The experimental results exhibit the superiority of bidirectional Mamba (BiMamba) for speech processing to vanilla Mamba. Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task. The crucial technologies for transferring Mamba to speech are then summarized in ablation studies and the discussion section to offer insights for future research. △ Less

Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.06516 [pdf, ps, other]

An Efficient Algorithm for Sum-Rate Maximization in Fluid Antenna-Assisted ISAC System

Authors: Qian Zhang, Mingjie Shao, Tong Zhang, Gaojie Chen, Ju Liu

Abstract: In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorit… ▽ More In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorithms, we propose an efficient proximal distance algorithm (PDA) to solve the multiuser sum-rate maximization problem with radar sensing constraint to obtain the closed-form beamforming vector. In addition, we develop an extrapolated projected gradient (EPG) algorithm to obtain a better antenna location configuration for FA to enhance the ISAC performance. Numerical results show that the considered FA-assisted ISAC system enjoys a higher sum-rate by the proposed algorithm, compared with that in existing non-FA ISAC systems. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.16223 [pdf, other]

Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: CVPR 2024 - NTIRE Workshop

arXiv:2404.15609 [pdf, other]

doi 10.1016/j.jprocont.2024.103173

Dynamic fault detection and diagnosis for alkaline water electrolyzer with variational Bayesian Sparse principal component analysis

Authors: Qi Zhang, Weihua Xu, Lei Xie, Hongye Su

Abstract: Electrolytic hydrogen production serves as not only a vital source of green hydrogen but also a key strategy for addressing renewable energy consumption challenges. For the safe production of hydrogen through alkaline water electrolyzer (AWE), dependable process monitoring technology is essential. However, random noise can easily contaminate the AWE process data collected in industrial settings, p… ▽ More Electrolytic hydrogen production serves as not only a vital source of green hydrogen but also a key strategy for addressing renewable energy consumption challenges. For the safe production of hydrogen through alkaline water electrolyzer (AWE), dependable process monitoring technology is essential. However, random noise can easily contaminate the AWE process data collected in industrial settings, presenting new challenges for monitoring methods. In this study, we develop the variational Bayesian sparse principal component analysis (VBSPCA) method for process monitoring. VBSPCA methods based on Gaussian prior and Laplace prior are derived to obtain the sparsity of the projection matrix, which corresponds to $\ell_2$ regularization and $\ell_1$ regularization, respectively. The correlation of dynamic latent variables is then analyzed by sparse autoregression and fault variables are diagnosed by fault reconstruction. The effectiveness of the method is verified by an industrial hydrogen production process, and the test results demonstrated that both Gaussian prior and Laplace prior based VBSPCA can effectively detect and diagnose critical faults in AWEs. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Journal ref: Journal of Process Control, 135:103173, March 2024. ISSN 0959-1524

arXiv:2404.11938 [pdf, other]

HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

Authors: Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu

Abstract: Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privac… ▽ More Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i.e., audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, IJCAI-2024

arXiv:2404.09519 [pdf, other]

Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control

Authors: Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie

Abstract: The accuracy of the underlying model predictions is crucial for the success of model predictive control (MPC) applications. If the model is unable to accurately analyze the dynamics of the controlled system, the performance and stability guarantees provided by MPC may not be achieved. Learning-based MPC can learn models from data, improving the applicability and reliability of MPC. This study deve… ▽ More The accuracy of the underlying model predictions is crucial for the success of model predictive control (MPC) applications. If the model is unable to accurately analyze the dynamics of the controlled system, the performance and stability guarantees provided by MPC may not be achieved. Learning-based MPC can learn models from data, improving the applicability and reliability of MPC. This study develops a nonlinear sparse variational Bayesian learning based MPC (NSVB-MPC) for nonlinear systems, where the model is learned by the developed NSVB method. Variational inference is used by NSVB-MPC to assess the predictive accuracy and make the necessary corrections to quantify system uncertainty. The suggested approach ensures input-to-state (ISS) and the feasibility of recursive constraints in accordance with the concept of an invariant terminal region. Finally, a PEMFC temperature control model experiment confirms the effectiveness of the NSVB-MPC method. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.00622 [pdf, other]

OpenMines: A Light and Comprehensive Mining Simulation Environment for Truck Dispatching

Authors: Shi Meng, Bin Tian, Xiaotong Zhang, Shuangying Qi, Caiji Zhang, Qiang Zhang

Abstract: Mine fleet management algorithms can significantly reduce operational costs and enhance productivity in mining systems. Most current fleet management algorithms are evaluated based on self-implemented or proprietary simulation environments, posing challenges for replication and comparison. This paper models the simulation environment for mine fleet management from a complex systems perspective. Bu… ▽ More Mine fleet management algorithms can significantly reduce operational costs and enhance productivity in mining systems. Most current fleet management algorithms are evaluated based on self-implemented or proprietary simulation environments, posing challenges for replication and comparison. This paper models the simulation environment for mine fleet management from a complex systems perspective. Building upon previous work, we introduce probabilistic, user-defined events for random event simulation and implement various evaluation metrics and baselines, effectively reflecting the robustness of fleet management algorithms against unforeseen incidents. We present ``OpenMines'', an open-source framework encompassing the entire process of mine system modeling, algorithm development, and evaluation, facilitating future algorithm comparison and replication in the field. Code is available in https://github.com/370025263/openmines. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: accepted in: 2024 35th IEEE Intelligent Vehicles Symposium (IV) 4 figures, 1 table

arXiv:2404.00608 [pdf, other]

Sample Complexity of Chance Constrained Optimization in Dynamic Environment

Authors: Apurv Shukla, Qian Zhang, Le Xie

Abstract: We study the scenario approach for solving chance-constrained optimization in time-coupled dynamic environments. Scenario generation methods approximate the true feasible region from scenarios generated independently and identically from the actual distribution. In this paper, we consider this problem in a dynamic environment, where the scenarios are assumed to be drawn sequentially from an unknow… ▽ More We study the scenario approach for solving chance-constrained optimization in time-coupled dynamic environments. Scenario generation methods approximate the true feasible region from scenarios generated independently and identically from the actual distribution. In this paper, we consider this problem in a dynamic environment, where the scenarios are assumed to be drawn sequentially from an unknown and time-varying distribution. Such dynamic environments are driven by changing environmental conditions that could be found in many real-world applications such as energy systems. We couple the time-varying distributions using the Wasserstein metric between the sequence of scenario-generating distributions and the actual chance-constrained distribution. Our main results are bounds on the number of samples essential for ensuring the ex-post risk in chance-constrained optimization problems when the underlying feasible set is convex or non-convex. Finally, our results are illustrated on multiple numerical experiments for both types of feasible sets. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: To apper in American Control Conference 2024

arXiv:2403.16677 [pdf, other]

FOOL: Addressing the Downlink Bottleneck in Satellite Computing with Neural Feature Compression

Authors: Alireza Furutanpey, Qiyang Zhang, Philipp Raith, Tobias Pfandzelter, Shangguang Wang, Schahram Dustdar

Abstract: Nanosatellite constellations equipped with sensors capturing large geographic regions provide unprecedented opportunities for Earth observation. As constellation sizes increase, network contention poses a downlink bottleneck. Orbital Edge Computing (OEC) leverages limited onboard compute resources to reduce transfer costs by processing the raw captures at the source. However, current solutions hav… ▽ More Nanosatellite constellations equipped with sensors capturing large geographic regions provide unprecedented opportunities for Earth observation. As constellation sizes increase, network contention poses a downlink bottleneck. Orbital Edge Computing (OEC) leverages limited onboard compute resources to reduce transfer costs by processing the raw captures at the source. However, current solutions have limited practicability due to reliance on crude filtering methods or over-prioritizing particular downstream tasks. This work presents FOOL, an OEC-native and task-agnostic feature compression method that preserves prediction performance. FOOL partitions high-resolution satellite imagery to maximize throughput. Further, it embeds context and leverages inter-tile dependencies to lower transfer costs with negligible overhead. While FOOL is a feature compressor, it can recover images with competitive scores on perceptual quality measures at lower bitrates. We extensively evaluate transfer cost reduction by including the peculiarity of intermittently available network connections in low earth orbit. Lastly, we test the feasibility of our system for standardized nanosatellite form factors. We demonstrate that FOOL permits downlinking over 100x the data volume without relying on prior information on the downstream tasks. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 18 pages, double column, 19 figures, 7 tables, Initial Submission to IEEE Transactions on Mobile Computing

arXiv:2403.11556 [pdf, other]

Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement

Authors: Qianyu Zhang, Bolun Zheng, Xinying Chen, Quan Chen, Zhunjie Zhu, Can** Wang, Zongpeng Li, Chengang Yan

Abstract: Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequen… ▽ More Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequency upsampling module (ImpFreqUp) and hierarchical and iterative refinement module (HIR). ImpFreqUp exploits DCT-domain prior derived through implicit DCT transform, and accurately reconstructs the DCT-domain loss via a coarse-to-fine transfer. Consequently, HIR is introduced to facilitate cross-collaboration and information compensation between the scales, thus further refine the feature maps and promote the visual quality of the final output. We demonstrate the effectiveness of the proposed modules via ablation experiments and visualized results. Extensive experiments on public benchmarks show that HFUR achieves state-of-the-art performance for both constant bit rate and constant QP modes. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11405 [pdf, other]

A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm

Authors: Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong

Abstract: Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt… ▽ More Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.08434 [pdf, other]

GRF-based Predictive Flocking Control with Dynamic Pattern Formation

Authors: Chenghao Yu, Dengyu Zhang, Qingrui Zhang

Abstract: It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used… ▽ More It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used to charaterize ``robot-robot'' and ``robot-environment'' interactions. Specialized performance-related energies, e.g., motion smoothness, are introduced in the proposed design to improve the flocking behaviors. The optimal control is obtained by maximizing a posterior distribution of a GRF. A region-based shape control is accomplished for pattern formation in light of a mean shift technique. The proposed algorithm is evaluated via the comparison with two state-of-the-art flocking control methods in an environment with obstacles. Both numerical simulations and real-world experiments are conducted to demonstrate the efficiency of the proposed design. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted by ICRA 2024

arXiv:2403.02565 [pdf, other]

Deep Cooperation in ISAC System: Resource, Node and Infrastructure Perspectives

Authors: Zhiqing Wei, Haotian Liu, Zhiyong Feng, Huici Wu, Fan Liu, Qixun Zhang, Yucong Du

Abstract: With the emerging Integrated Sensing and Communication (ISAC) technique, exploiting the mobile communication system with multi-domain resources, multiple network elements, and large-scale infrastructures to realize cooperative sensing is a crucial approach satisfying the requirements of high-accuracy and large-scale sensing in IoE. In this article, the deep cooperation in ISAC system including thr… ▽ More With the emerging Integrated Sensing and Communication (ISAC) technique, exploiting the mobile communication system with multi-domain resources, multiple network elements, and large-scale infrastructures to realize cooperative sensing is a crucial approach satisfying the requirements of high-accuracy and large-scale sensing in IoE. In this article, the deep cooperation in ISAC system including three perspectives is investigated. In the microscopic perspective, namely, within a single node, the sensing information carried by time-frequency-space-code domain resources is processed, such as phase compensation, coherent accumulation and other operations, thereby improving the sensing accuracy. In the mesoscopic perspective, the sensing accuracy could be improved through the cooperation of multiple nodes. We explore various multi-node cooperative sensing scenarios and present the corresponding challenges and future research trends. In the macroscopic perspective, the massive number of infrastructures from the same operator or different operators could perform cooperative sensing to extend the sensing coverage and improve the sensing continuity. We investigate network architecture, target tracking methods, and the large-scale sensing assisted digital twin construction. Simulation results demonstrate the superiority of multi-nodes and multi-resources cooperative sensing over single resource or node sensing. This article may provide a deep and comprehensive view on the cooperative sensing in ISAC system to enhance the performance of sensing, supporting the applications of IoE. △ Less

Submitted 29 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 8 pages and 6 figures, Accepted by IEEE Internet of Things Magazine

arXiv:2403.02236 [pdf, other]

Interpretable Models for Detecting and Monitoring Elevated Intracranial Pressure

Authors: Darryl Hannan, Steven C. Nesbit, Ximing Wen, Glen Smith, Qiao Zhang, Alberto Goffi, Vincent Chan, Michael J. Morris, John C. Hunninghake, Nicholas E. Villalobos, Edward Kim, Rosina O. Weber, Christopher J. MacLellan

Abstract: Detecting elevated intracranial pressure (ICP) is crucial in diagnosing and managing various neurological conditions. These fluctuations in pressure are transmitted to the optic nerve sheath (ONS), resulting in changes to its diameter, which can then be detected using ultrasound imaging devices. However, interpreting sonographic images of the ONS can be challenging. In this work, we propose two sy… ▽ More Detecting elevated intracranial pressure (ICP) is crucial in diagnosing and managing various neurological conditions. These fluctuations in pressure are transmitted to the optic nerve sheath (ONS), resulting in changes to its diameter, which can then be detected using ultrasound imaging devices. However, interpreting sonographic images of the ONS can be challenging. In this work, we propose two systems that actively monitor the ONS diameter throughout an ultrasound video and make a final prediction as to whether ICP is elevated. To construct our systems, we leverage subject matter expert (SME) guidance, structuring our processing pipeline according to their collection procedure, while also prioritizing interpretability and computational efficiency. We conduct a number of experiments, demonstrating that our proposed systems are able to outperform various baselines. One of our SMEs then manually validates our top system's performance, lending further credibility to our approach while demonstrating its potential utility in a clinical setting. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures, ISBI 2024

arXiv:2402.14285 [pdf, other]

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui… ▽ More We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/. △ Less

Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: ICML 2024 (Oral)

arXiv:2402.13276 [pdf, other]

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Authors: Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

Abstract: Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the… ▽ More Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing depressive states is still relatively untapped. In this paper, we present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection. We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals. Evaluations of the proposed approach on the DAIC-WOZ dataset reveal state-of-the-art results when compared with existing Audio-Text baselines. In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11898 [pdf, other]

Automatic Radio Map Adaptation for Robust Localization with Dynamic Adversarial Learning

Authors: Lingyan Zhang, Junlin Huang, Tingting Zhang, Qinyu Zhang

Abstract: Wireless fingerprint-based localization has become one of the most promising technologies for ubiquitous location-aware computing and intelligent location-based services. However, due to RF vulnerability to environmental dynamics over time, continuous radio map updates are time-consuming and infeasible, resulting in severe accuracy degradation. To address this issue, we propose a novel approach of… ▽ More Wireless fingerprint-based localization has become one of the most promising technologies for ubiquitous location-aware computing and intelligent location-based services. However, due to RF vulnerability to environmental dynamics over time, continuous radio map updates are time-consuming and infeasible, resulting in severe accuracy degradation. To address this issue, we propose a novel approach of robust localization with dynamic adversarial learning, known as DadLoc which realizes automatic radio map adaptation by incorporating multiple robust factors underlying RF fingerprints to learn the evolving feature representation with the complicated environmental dynamics. DadLoc performs a finer-grained distribution adaptation with the developed dynamic adversarial adaptation network and quantifies the contributions of both global and local distribution adaptation in a dynamics-adaptive manner. Furthermore, we adopt the strategy of prediction uncertainty suppression to conduct source-supervised training, target-unsupervised training, and source-target dynamic adversarial adaptation which can trade off the environment adaptability and the location discriminability of the learned deep representation for safe and effective feature transfer across different environments. With extensive experimental results, the satisfactory accuracy over other comparative schemes demonstrates that the proposed DanLoc can facilitate fingerprint-based localization for wide deployments. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 11 pages, 11 figures

arXiv:2402.10642 [pdf, other]

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches to accelerate training a key factor in the costs associated with adding or customizing voices often necessitate complex modifications to the model, compromising their universal applicability. To address the aforementioned challenges, we propose an inquiry: is it possible to enhance the training/inference speed and performance of DDPMs by modifying the speech signal itself? In this paper, we double the training and inference speed of Speech DDPMs by simply redirecting the generative target to the wavelet domain. This method not only achieves comparable or superior performance to the original model in speech synthesis tasks but also demonstrates its versatility. By investigating and utilizing different wavelet bases, our approach proves effective not just in speech synthesis, but also in speech enhancement. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.05390 [pdf, other]

Integrated Sensing and Communication Driven Digital Twin for Intelligent Machine Network

Authors: Zhiqing Wei, Yucong Du, Qixun Zhang, Wangjun Jiang, Yanpeng Cui, Zeyang Meng, Huici Wu, Zhiyong Feng

Abstract: Intelligent machines (IMs), including industrial machines, unmanned aerial vehicles (UAVs), and unmanned vehicles, etc., could perform effective cooperation in complex environment when they form IM network. The efficient environment sensing and communication are crucial for IM network, enabling the real-time and stable control of IMs. With the emergence of integrated sensing and communication (ISA… ▽ More Intelligent machines (IMs), including industrial machines, unmanned aerial vehicles (UAVs), and unmanned vehicles, etc., could perform effective cooperation in complex environment when they form IM network. The efficient environment sensing and communication are crucial for IM network, enabling the real-time and stable control of IMs. With the emergence of integrated sensing and communication (ISAC) technology, IM network is empowered with ubiquitous sensing capabilities, which is helpful in improving the efficiency of communication and sensing with the mutual benefit of them. However, the massive amount of sensing information brings challenges for the processing, storage and application of sensing information. In this article, ISAC driven digital twin (DT) is proposed for IM network, and the architecture and enabling technologies are revealed. ISAC driven DT structurally stores the sensing information, which is further applied to optimize communication, networking and control schemes of IMs, promoting the widespread applications of IMs. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures, 1 Table

ACM Class: C.2.1

arXiv:2402.04566 [pdf]

Triplet-constraint Transformer with Multi-scale Refinement for Dose Prediction in Radiotherapy

Authors: Lu Wen, Qihun Zhang, Zhenghao Feng, Yuanyuan Xu, Xiao Chen, Jiliu Zhou, Yan Wang

Abstract: Radiotherapy is a primary treatment for cancers with the aim of applying sufficient radiation dose to the planning target volume (PTV) while minimizing dose hazards to the organs at risk (OARs). Convolutional neural networks (CNNs) have automated the radiotherapy plan-making by predicting the dose maps. However, current CNN-based methods ignore the remarkable dose difference in the dose map, i.e.,… ▽ More Radiotherapy is a primary treatment for cancers with the aim of applying sufficient radiation dose to the planning target volume (PTV) while minimizing dose hazards to the organs at risk (OARs). Convolutional neural networks (CNNs) have automated the radiotherapy plan-making by predicting the dose maps. However, current CNN-based methods ignore the remarkable dose difference in the dose map, i.e., high dose value in the interior PTV while low value in the exterior PTV, leading to a suboptimal prediction. In this paper, we propose a triplet-constraint transformer (TCtrans) with multi-scale refinement to predict the high-quality dose distribution. Concretely, a novel PTV-guided triplet constraint is designed to refine dose feature representations in the interior and exterior PTV by utilizing the explicit geometry of PTV. Furthermore, we introduce a multi-scale refinement (MSR) module to effectively fulfill the triplet constraint in different decoding layers with multiple scales. Besides, a transformer encoder is devised to learn the important global dosimetric knowledge. Experiments on a clinical cervical cancer dataset demonstrate the superiority of our method. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: accepted by 2024 IEEE ISBI

arXiv:2401.16778 [pdf, other]

Secure ISAC MIMO Systems: Exploiting Interference With Bayesian Cramér-Rao Bound Optimization

Authors: Nanchi Su, Fan Liu, Christos Masouros, George C. Alexandropoulos, Yifeng Xiong, Qinyu Zhang

Abstract: In this paper, we present a signaling design for secure integrated sensing and communication (ISAC) systems comprising a dual-functional multi-input multi-output (MIMO) base station (BS) that simultaneously communicates with multiple users while detecting targets present in their vicinity, which are regarded as potential eavesdroppers. In particular, assuming that the distribution of each paramete… ▽ More In this paper, we present a signaling design for secure integrated sensing and communication (ISAC) systems comprising a dual-functional multi-input multi-output (MIMO) base station (BS) that simultaneously communicates with multiple users while detecting targets present in their vicinity, which are regarded as potential eavesdroppers. In particular, assuming that the distribution of each parameter to be estimated is known \textit{a priori}, we focus on optimizing the targets' sensing performance. To this end, we derive and minimize the Bayesian Cramér-Rao bound (BCRB), while ensuring certain communication quality of service (QoS) by exploiting constructive interference (CI). The latter scheme enforces that the received signals at the eavesdrop** targets fall into the destructive region of the signal constellation, to deteriorate their decoding probability, thus enhancing the ISAC's system physical-layer security (PLS) capability. To tackle the nonconvexity of the formulated problem, a tailored successive convex approximation method is proposed for its efficient solution. Our extensive numerical results verify the effectiveness of the proposed secure ISAC design showing that the proposed algorithm outperforms block-level precoding techniques. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 6 pages, 4 figures, submitted for journal publication

arXiv:2401.12167 [pdf, other]

Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder

Authors: Nan Li, Alexandros Iosifidis, Qi Zhang

Abstract: This paper studies the computational offloading of CNN inference in dynamic multi-access edge computing (MEC) networks. To address the uncertainties in communication time and computation resource availability, we propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN), for effective semantic extraction and compression in partial offloading. In the semantic encoder,… ▽ More This paper studies the computational offloading of CNN inference in dynamic multi-access edge computing (MEC) networks. To address the uncertainties in communication time and computation resource availability, we propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN), for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy. To effectively trade-off communication, computation, and inference accuracy, we design a reward function and formulate the offloading problem of CNN inference as a maximization problem with the goal of maximizing the average inference accuracy and throughput over the long term. To address this maximization problem, we propose a graph reinforcement learning-based AECNN (GRL-AECNN) method, which outperforms existing works DROO-AECNN, GRL-BottleNet++ and GRL-DeepJSCC under different dynamic scenarios. This highlights the advantages of GRL-AECNN in offloading decision-making in dynamic MEC. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2211.13745

arXiv:2401.09686 [pdf, other]

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Authors: Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Abstract: Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform… ▽ More Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform a comprehensive empirical study evaluating five positional encoding methods, i.e., Sinusoidal and learned absolute position embedding (APE), T5-RPE, KERPLE, as well as the Transformer without positional encoding (No-Pos), across both causal and noncausal configurations. We conduct extensive speech enhancement experiments, involving spectral map** and masking methods. Our findings establish that positional encoding is not quite helpful for the models in a causal configuration, which indicates that causal attention may implicitly incorporate position information. In a noncausal configuration, the models significantly benefit from the use of positional encoding. In addition, we find that among the four position embeddings, relative position embeddings outperform APEs. △ Less

Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.09510 [pdf, other]

Community Detection in the Multi-View Stochastic Block Model

Authors: Yexin Zhang, Zhongtian Ma, Qiaosheng Zhang, Zhen Wang, Xuelong Li

Abstract: This paper considers the problem of community detection on multiple potentially correlated graphs from an information-theoretical perspective. We first put forth a random graph model, called the multi-view stochastic block model (MVSBM), designed to generate correlated graphs on the same set of nodes (with cardinality $n$). The $n$ nodes are partitioned into two disjoint communities of equal size.… ▽ More This paper considers the problem of community detection on multiple potentially correlated graphs from an information-theoretical perspective. We first put forth a random graph model, called the multi-view stochastic block model (MVSBM), designed to generate correlated graphs on the same set of nodes (with cardinality $n$). The $n$ nodes are partitioned into two disjoint communities of equal size. The presence or absence of edges in the graphs for each pair of nodes depends on whether the two nodes belong to the same community or not. The objective for the learner is to recover the hidden communities with observed graphs. Our technical contributions are two-fold: (i) We establish an information-theoretic upper bound (Theorem~1) showing that exact recovery of community is achievable when the model parameters of MVSBM exceed a certain threshold. (ii) Conversely, we derive an information-theoretic lower bound (Theorem~2) showing that when the model parameters of MVSBM fall below the aforementioned threshold, then for any estimator, the expected number of misclassified nodes will always be greater than one. Our results for the MVSBM recover several prior results for community detection in the standard SBM as well as in multiple independent SBMs as special cases. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2401.08197 [pdf, other]

Matrix Completion with Hypergraphs:Sharp Thresholds and Efficient Algorithms

Authors: Zhongtian Ma, Qiaosheng Zhang, Zhen Wang

Abstract: This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrati… ▽ More This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrating a phase transition phenomenon. The threshold can be expressed as a function of the ``quality'' of hypergraphs, enabling us to \emph{quantify} the amount of reduction in sample probability due to the exploitation of hypergraphs. This also highlights the usefulness of hypergraphs in the matrix completion problem. En route to discovering the sharp threshold, we develop a computationally efficient matrix completion algorithm that effectively exploits the observed graphs and hypergraphs. Theoretical analyses show that our algorithm succeeds with high probability as long as the sample probability exceeds the aforementioned threshold, and this theoretical result is further validated by synthetic experiments. Moreover, our experiments on a real social network dataset (with both graphs and hypergraphs) show that our algorithm outperforms other state-of-the-art matrix completion algorithms. △ Less

Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2401.07139 [pdf, other]

doi 10.1109/TGRS.2023.3291822

Deep Blind Super-Resolution for Satellite Video

Authors: Yi Xiao, Qiangqiang Yuan, Qiang Zhang, Liangpei Zhang

Abstract: Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mai… ▽ More Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at https://github.com/XY-boy/Blind-Satellite-VSR △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Published in IEEE TGRS

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5516316

arXiv:2401.06204 [pdf, other]

An Exploratory Assessment of LLM's Potential Toward Flight Trajectory Reconstruction Analysis

Authors: Qilei Zhang, John H. Mott

Abstract: Large Language Models (LLMs) hold transformative potential in aviation, particularly in reconstructing flight trajectories. This paper investigates this potential, grounded in the notion that LLMs excel at processing sequential data and deciphering complex data structures. Utilizing the LLaMA 2 model, a pre-trained open-source LLM, the study focuses on reconstructing flight trajectories using Auto… ▽ More Large Language Models (LLMs) hold transformative potential in aviation, particularly in reconstructing flight trajectories. This paper investigates this potential, grounded in the notion that LLMs excel at processing sequential data and deciphering complex data structures. Utilizing the LLaMA 2 model, a pre-trained open-source LLM, the study focuses on reconstructing flight trajectories using Automatic Dependent Surveillance-Broadcast (ADS-B) data with irregularities inherent in real-world scenarios. The findings demonstrate the model's proficiency in filtering noise and estimating both linear and curved flight trajectories. However, the analysis also reveals challenges in managing longer data sequences, which may be attributed to the token length limitations of LLM models. The study's insights underscore the promise of LLMs in flight trajectory reconstruction and open new avenues for their broader application across the aviation and transportation sectors. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 6 pages

arXiv:2401.06098 [pdf, other]

Proximal observers for secure state estimation

Authors: Laurent Bako, Madiha Nadri, Vincent Andrieu, Qinghua Zhang

Abstract: This paper discusses a general framework for designing robust state estimators for a class of discrete-time nonlinear systems. We consider systems that may be impacted by impulsive (sparse but otherwise arbitrary) measurement noise sequences. We show that a family of state estimators, robust to this type of undesired signal, can be obtained by minimizing a class of nonsmooth convex functions at ea… ▽ More This paper discusses a general framework for designing robust state estimators for a class of discrete-time nonlinear systems. We consider systems that may be impacted by impulsive (sparse but otherwise arbitrary) measurement noise sequences. We show that a family of state estimators, robust to this type of undesired signal, can be obtained by minimizing a class of nonsmooth convex functions at each time step. The resulting state observers are defined through proximal operators. We obtain a nonlinear implicit dynamical system in term of estimation error and prove, in the noise-free setting, that it vanishes asymptotically when the minimized loss function and the to-beobserved system enjoy appropriate properties. From a computational perspective, even though the proposed observers can be implemented via efficient numerical procedures, they do not admit closed-form expressions. The paper argues that by adopting appropriate relaxations, simple and fast analytic expressions can be derived. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 15 pages, 5 figures

arXiv:2401.02771 [pdf, other]

Powerformer: A Section-adaptive Transformer for Power Flow Adjustment

Authors: Kaixuan Chen, Wei Luo, Shunyu Liu, Yaoquan Wei, Yihe Zhou, Yunpeng Qing, Quan Zhang, Jie Song, Mingli Song

Abstract: In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in… ▽ More In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods. △ Less

Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 8 figures

arXiv:2401.00194 [pdf, ps, other]

On the Identifiability from Modulo Measurements under DFT Sensing Matrix

Authors: Qi Zhang, Jiang Zhu, Fengzhong Qu, Zheng Zhu, De Wen Soh

Abstract: Unlimited sampling was recently introduced to deal with the clip** or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic pol… ▽ More Unlimited sampling was recently introduced to deal with the clip** or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic polynomials, we derive a sufficient condition for uniquely identifying the original signal in modulo-DFT. Additionally, for periodic bandlimited signals (PBSs) under unlimited sampling which can be viewed as a special case of modulo-DFT, the necessary and sufficient condition for the unique recovery of the original signal are provided. Moreover, we show that when the oversampling factor exceeds $3(1+1/P)$, PBS is always identifiable from the modulo samples, where $P$ is the number of harmonics including the fundamental component in the positive frequency part. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.08097 [pdf, ps, other]

Hierarchical Cognitive Spectrum Sharing in Space-Air-Ground Integrated Networks

Authors: Zizhen Zhou, Qianqian Zhang, Jungang Ge, Ying-Chang Liang

Abstract: In space-air-ground integrated networks (SAGINs), cognitive spectrum sharing has been regarded as a promising solution to improve spectrum efficiency by enabling a secondary network to access the spectrum of a primary network. However, different networks in SAGIN may have different quality of service (QoS) requirements, which can not be well satisfied with the traditional cognitive spectrum sharin… ▽ More In space-air-ground integrated networks (SAGINs), cognitive spectrum sharing has been regarded as a promising solution to improve spectrum efficiency by enabling a secondary network to access the spectrum of a primary network. However, different networks in SAGIN may have different quality of service (QoS) requirements, which can not be well satisfied with the traditional cognitive spectrum sharing architecture. For example, the aerial network typically has high QoS requirements, which however may not be met when it acts as a secondary network. To address this issue, in this paper, we propose a hierarchical cognitive spectrum sharing architecture (HCSSA) for SAGINs, where the secondary networks are divided into a preferential one and an ordinary one. Specifically, the aerial and terrestrial networks can access the spectrum of the satellite network under the condition that the caused interference to the satellite terminal is below a certain threshold. Besides, considering that the aerial network has a higher priority than the terrestrial network, we aim to use a rate constraint to ensure the performance of the aerial network. Subject to these two constraints, we consider a sum-rate maximization for the terrestrial network by jointly optimizing the transmit beamforming vectors of the aerial and terrestrial base stations. To solve this non-convex problem, we propose a penalty-based iterative beamforming (PIBF) scheme that uses the penalty method and the successive convex approximation technique. Moreover, we also develop three low-complexity schemes by optimizing the normalized beamforming vectors and power control. Finally, we provide extensive numerical simulations to compare the performance of the proposed PIBF scheme and the low-complexity schemes. The results also demonstrate the advantages of the proposed HCSSA compared with the traditional cognitive spectrum sharing architecture. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.07941 [pdf, ps, other]

An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system

Authors: Qian Zhang, Mingjie Shao, Qiang Li, Ju Liu

Abstract: Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms… ▽ More Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms are less studied. In this paper, we consider the sum rate maximization problem in the multiuser massive multiple-input single-output (MISO) downlink with the aid of a large-scale active RIS. Existing approaches for handling this problem usually resort to general optimization solvers and can be computationally prohibitive. We propose an efficient block successive upper bound minimization (BSUM) method, of which each step has a (semi) closed-form update. Thus, the proposed algorithm has an attractive low per-iteration complexity. By simulation, our proposed algorithm consumes much less computation than the existing approaches. In particular, when the MIMO and/or RIS sizes are large, our proposed algorithm can be orders-of-magnitude faster than existing approaches. △ Less

Submitted 11 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: ICASSP 2024

arXiv:2311.16192 [pdf, other]

Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction

Authors: Junliang Wang, Qinghua Zhang, Guanhua Zhu, Guoxi Sun

Abstract: Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integra… ▽ More Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.15164 [pdf]

doi 10.1002/lpor.202400187

Neural-Optic Co-Designed Polarization-Multiplexed Metalens for Compact Computational Spectral Imaging

Authors: Qiangbo Zhang, Peicheng Lin, Chang Wang, Yang Zhang, Zeqing Yu, Xinyu Liu, Ting Xu, Zhenrong Zheng

Abstract: As the realm of spectral imaging applications extends its reach into the domains of mobile technology and augmented reality, the demands for compact yet high-fidelity systems become increasingly pronounced. Conventional methodologies, exemplified by coded aperture snapshot spectral imaging systems, are significantly limited by their cumbersome physical dimensions and form factors. To address this… ▽ More As the realm of spectral imaging applications extends its reach into the domains of mobile technology and augmented reality, the demands for compact yet high-fidelity systems become increasingly pronounced. Conventional methodologies, exemplified by coded aperture snapshot spectral imaging systems, are significantly limited by their cumbersome physical dimensions and form factors. To address this inherent challenge, diffractive optical elements (DOEs) have been repeatedly employed as a means to mitigate issues related to the bulky nature of these systems. Nonetheless, it's essential to note that the capabilities of DOEs primarily revolve around the modulation of the phase of light. Here, we introduce an end-to-end computational spectral imaging framework based on a polarization-multiplexed metalens. A distinguishing feature of this approach lies in its capacity to simultaneously modulate orthogonal polarization channels. When harnessed in conjunction with a neural network, it facilitates the attainment of high-fidelity spectral reconstruction. Importantly, the framework is intrinsically fully differentiable, a feature that permits the joint optimization of both the metalens structure and the parameters governing the neural network. The experimental results presented herein validate the exceptional spatial-spectral reconstruction performance, underscoring the efficacy of this system in practical, real-world scenarios. This innovative approach transcends the traditional boundaries separating hardware and software in the realm of computational imaging and holds the promise of substantially propelling the miniaturization of spectral imaging systems. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.10525 [pdf, other]

Utilizing VQ-VAE for End-to-End Health Indicator Generation in Predicting Rolling Bearing RUL

Authors: Junliang Wang, Qinghua Zhang, Guanhua Zhu, Guoxi Sun

Abstract: The prediction of the remaining useful life (RUL) of rolling bearings is a pivotal issue in industrial production. A crucial approach to tackling this issue involves transforming vibration signals into health indicators (HI) to aid model training. This paper presents an end-to-end HI construction method, vector quantised variational autoencoder (VQ-VAE), which addresses the need for dimensionality… ▽ More The prediction of the remaining useful life (RUL) of rolling bearings is a pivotal issue in industrial production. A crucial approach to tackling this issue involves transforming vibration signals into health indicators (HI) to aid model training. This paper presents an end-to-end HI construction method, vector quantised variational autoencoder (VQ-VAE), which addresses the need for dimensionality reduction of latent variables in traditional unsupervised learning methods such as autoencoder. Moreover, concerning the inadequacy of traditional statistical metrics in reflecting curve fluctuations accurately, two novel statistical metrics, mean absolute distance (MAD) and mean variance (MV), are introduced. These metrics accurately depict the fluctuation patterns in the curves, thereby indicating the model's accuracy in discerning similar features. On the PMH2012 dataset, methods employing VQ-VAE for label construction achieved lower values for MAD and MV. Furthermore, the ASTCN prediction model trained with VQ-VAE labels demonstrated commendable performance, attaining the lowest values for MAD and MV. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 17 figures

arXiv:2311.07157 [pdf, other]

Communication-Assisted Sensing in 6G Networks

Authors: Fuwang Dong, Fan Liu, Shihang Lu, Yifeng Xiong, Qixun Zhang, Zhiyong Feng

Abstract: Exploring the mutual benefit and reciprocity of sensing and communication (S\&C) functions is fundamental to realizing deeper integration for integrated sensing and communication (ISAC) systems. This paper investigates a novel communication-assisted sensing (CAS) system within 6G perceptive networks, where the base station actively senses the targets through device-free wireless sensing and simult… ▽ More Exploring the mutual benefit and reciprocity of sensing and communication (S\&C) functions is fundamental to realizing deeper integration for integrated sensing and communication (ISAC) systems. This paper investigates a novel communication-assisted sensing (CAS) system within 6G perceptive networks, where the base station actively senses the targets through device-free wireless sensing and simultaneously transmits the estimated information to end-users. In such a CAS system, we first establish an optimal waveform design framework based on the rate-distortion (RD) and source-channel separation (SCT) theorems. After analyzing the relationships between the sensing distortion, coding rate, and communication channel capacity, we propose two distinct waveform design strategies in the scenario of target impulse response estimation. In the separated S\&C waveforms scheme, we equivalently transform the original problem into a power allocation problem and develop a low-complexity one-dimensional search algorithm, shedding light on a notable power allocation tradeoff between the S\&C waveform. In the dual-functional waveform scheme, we conceive a heuristic mutual information optimization algorithm for the general case, alongside a modified gradient projection algorithm tailored for the scenarios with independent sensing sub-channels. Additionally, we identify the presence of both subspace tradeoff and water-filling tradeoff in this scheme. Finally, we validate the effectiveness of the proposed algorithms through numerical simulations. △ Less

Submitted 15 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.04534 [pdf, other]

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

Abstract: Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Mask… ▽ More Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Masking strategy for the ASR task, which ignores the dependency among speech tokens. In this paper, we propose to model speech tokens in an autoregressive way, similar to text. We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach. To address this issue, we propose a novel approach denoted Smoothed Label Distillation (SLD), which applies a KL divergence loss with smoothed labels on speech tokens. Our experiments show that SLD effectively models speech tokens and outperforms Loss Masking for decoder-only Transformers in ASR tasks with different speech discretization methods. The source code can be found here: https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/sld △ Less

Submitted 4 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 5 pages, accepted by ICASSP 2024

arXiv:2311.02928 [pdf, ps, other]

doi 10.1109/TWC.2023.3286395

Pilot Design and Signal Detection for Symbiotic Radio over OFDM Carriers

Authors: Hao Chen, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang

Abstract: Symbiotic radio (SR) is a promising solution to achieve high spectrum- and energy-efficiency due to its spectrum sharing and low-power consumption properties, in which the secondary system achieves data transmissions by backscattering the signal originating from the primary system. In this paper, we are interested in the pilot design and signal detection when the primary transmission adopts orthog… ▽ More Symbiotic radio (SR) is a promising solution to achieve high spectrum- and energy-efficiency due to its spectrum sharing and low-power consumption properties, in which the secondary system achieves data transmissions by backscattering the signal originating from the primary system. In this paper, we are interested in the pilot design and signal detection when the primary transmission adopts orthogonal frequency division multiplexing (OFDM). In particular, to preserve the channel orthogonality among the OFDM sub-carriers, each secondary symbol is designed to span an entire OFDM symbol. The comb-type pilot structure is employed by the primary transmission, while the preamble pilot structure is used by the secondary transmission. With the designed pilot structures, the primary signal can be detected via the conventional methods by treating the secondary signal as a part of the composite channel, i.e., the effective channel of the primary transmission. Furthermore, the secondary signal can be extracted from the estimated composite channel with the help of the detected primary signal. The bit error rate (BER) performance with both perfect and estimated CSI, the diversity orders of the primary and secondary transmissions, and the sensitivity to symbol synchronization error are analyzed. Simulation results show that the performance of the primary transmission is enhanced thanks to the backscatter link established by the secondary transmission. More importantly, even without the direct link, the primary and secondary transmissions can be supported via only the backscatter link. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, early access, 2023

arXiv:2311.02250 [pdf, other]

Efficient Scenario Generation for Chance-constrained Economic Dispatch Considering Ambient Wind Conditions

Authors: Qian Zhang, Apurv Shukla, Le Xie

Abstract: Scenario generation is an effective data-driven method for solving chance-constrained optimization while ensuring desired risk guarantees with a finite number of samples. Crucial challenges in deploying this technique in the real world arise due to the absence of appropriate risk-tuning models tailored for the desired application. In this paper, we focus on designing efficient scenario generation… ▽ More Scenario generation is an effective data-driven method for solving chance-constrained optimization while ensuring desired risk guarantees with a finite number of samples. Crucial challenges in deploying this technique in the real world arise due to the absence of appropriate risk-tuning models tailored for the desired application. In this paper, we focus on designing efficient scenario generation schemes for economic dispatch in power systems. We propose a novel scenario generation method based on filtering scenarios using ambient wind conditions. These filtered scenarios are deployed incrementally in order to meet desired risk levels while using minimum resources. In order to study the performance of the proposed scheme, we illustrate the procedure on case studies performed for both 24-bus and 118-bus systems with real-world wind power forecasting data. Numerical results suggest that the proposed filter-and-increment scenario generation model leads to a precise and efficient solution for the chance-constrained economic dispatch problem. △ Less

Submitted 2 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 12 pages

Showing 1–50 of 295 results for author: Zhang, Q