-
Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis
Authors:
Xueyao Zhang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
George C. Alexandropoulos,
Yan Zhang,
Merouane Debbah,
Chau Yuen
Abstract:
This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can…
▽ More
This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can cause outages during the task offloading. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) whose computationally capable metamaterials are leveraged to jointly enable V2I reflective links as well as to implement interference cancellation at the V2V links. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices to maximize an autonomous driving safety task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems solved via an alternate approximation method. Our simulation results showcase that the proposed RICS-assisted offloading framework significantly improves the safety of the considered autonomous driving network, yielding a nearly 34\% improvement in the safety coefficient of the CVs. In addition, it is demonstrated that the V2V data rate can be improved by around 60\% indicating that the RICS-induced adjustment of the signals can effectively mitigate interference at the V2V link.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification
Authors:
Kaining Wang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
Mérouane Debbah,
Chau Yuen
Abstract:
The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b…
▽ More
The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-band signals has been largely overlooked, leading to RF spectrum pollution and degradation of wireless transmissions. To address this issue, we propose a novel surface structure called the Filtering Reconfigurable Intelligent Computational Surface (FRICS). We introduce two types of FRICS structures: one that dynamically reflects resonance band signals through a tunable spatial filter while absorbing out-of-band signals using metamaterials and the other one that dynamically amplifies in-band signals using computational metamaterials while reflecting out-of-band signals. To evaluate the performance of FRICS, we implement it in device-to-device (D2D) communication and vehicular-to-everything (V2X) scenarios. The experiments demonstrate the superiority of FRICS in signal-to-interference-noise ratio (SINR) and energy efficiency (EE). Finally, we discuss the critical challenges faced and promising techniques for implementing FRICS in future wireless systems.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Wound Tissue Segmentation in Diabetic Foot Ulcer Images Using Deep Learning: A Pilot Study
Authors:
Mrinal Kanti Dhar,
Chuanbo Wang,
Yash Patel,
Taiyu Zhang,
Jeffrey Niezgoda,
Sandeep Gopalakrishnan,
Keke Chen,
Zeyun Yu
Abstract:
Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tis…
▽ More
Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tissues labeled by wound experts and 600 unlabeled images. Additionally, we conducted a pilot study on segmenting wound characteristics including fibrin, granulation, and callus using deep learning. Due to the limited amount of annotated data, our framework consists of both supervised learning (SL) and semi-supervised learning (SSL) phases. In the SL phase, we propose a hybrid model featuring a Mix Transformer (MiT-b3) in the encoder and a CNN in the decoder, enhanced by the integration of a parallel spatial and channel squeeze-and-excitation (P-scSE) module known for its efficacy in improving boundary accuracy. The SSL phase employs a pseudo-labeling-based approach, iteratively identifying and incorporating valuable unlabeled images to enhance overall segmentation performance. Comparative evaluations with state-of-the-art methods are conducted for both SL and SSL phases. The SL achieves a Dice Similarity Coefficient (DSC) of 84.89%, which has been improved to 87.64% in the SSL phase. Furthermore, the results are benchmarked against two widely used SSL approaches: Generative Adversarial Networks and Cross-Consistency Training. Additionally, our hybrid model outperforms the state-of-the-art methods with a 92.99% DSC in performing binary segmentation of DFU wound areas when tested on the Chronic Wound dataset. Codes and data are available at https://github.com/uwm-bigdata/DFUTissueSegNet.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations
Authors:
Xuelin Cao,
Bo Yang,
Kaining Wang,
Xinghua Li,
Zhiwen Yu,
Chau Yuen,
Yan Zhang,
Zhu Han
Abstract:
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz…
▽ More
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations
Authors:
Chao He,
Hongmei Shi,
Ruixin Li,
Jianbo Li,
ZuJun Yu
Abstract:
The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentia…
▽ More
The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment
Authors:
Fengbin Guan,
Xin Li,
Zihao Yu,
Yiting Lu,
Zhibo Chen
Abstract:
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been und…
▽ More
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been under-explored. Consequently, we propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals that the Mamba model has obvious advantages compared with existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost for IQA. To increase the transferability of Q-Mamba, we propose the StylePrompt tuning paradigm, where the basic lightweight mean and variance prompts are injected to assist the task-adaptive transfer learning of pre-trained Q-Mamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our proposed StylePrompt enables better perception transfer capability with less computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets have demonstrated the effectiveness of our proposed Q-Mamba.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Environment-Aware Codebook Design for RIS-Assisted MU-MISO Communications: Implementation and Performance Analysis
Authors:
Zhiheng Yu,
Jiancheng An,
Ertugrul Basar,
Lu Gan,
Chau Yuen
Abstract:
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can proactively reshape the characteristics of wireless channel environments. In RIS-assisted communication systems, the acquisition of channel state information (CSI) and the optimization of reflecting coefficients constitute major design challenges. To address these issues, codebook-based sol…
▽ More
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can proactively reshape the characteristics of wireless channel environments. In RIS-assisted communication systems, the acquisition of channel state information (CSI) and the optimization of reflecting coefficients constitute major design challenges. To address these issues, codebook-based solutions have been developed recently, which, however, are mostly environment-agnostic. In this paper, a novel environment-aware codebook protocol is proposed, which can significantly reduce both pilot overhead and computational complexity, while maintaining expected communication performance. Specifically, first of all, a channel training framework is introduced to divide the training phase into several blocks. In each block, we directly estimate the composite end-to-end channel and focus only on the transmit beamforming. Second, we propose an environment-aware codebook generation scheme, which first generates a group of channels based on statistical CSI, and then obtains their corresponding RIS configuration by utilizing the alternating optimization (AO) method offline. In each online training block, the RIS is configured based on the corresponding codeword in the environment-aware codebook, and the optimal codeword resulting in the highest sum rate is adopted for assisting in the downlink data transmission. Third, we analyze the theoretical performance of the environment-aware codebook-based protocol taking into account the channel estimation errors. Finally, numerical simulations are provided to verify our theoretical analysis and the performance of the proposed scheme. In particular, the simulation results demonstrate that our protocol is more competitive than conventional environment-agnostic codebooks.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
In-sensor Computing ANN Capacitive Sensors
Authors:
Guihua Zhao,
Yating Peng,
Jiaxin Zhu,
Xin Tang,
Zhiyi Yu
Abstract:
This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power.
This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Estimation of Participation Factors for Power System Oscillation from Measurements
Authors:
Tianwei Xia,
Zhe Yu,
Kai Sun,
Di Shi,
Kaiyang Huang
Abstract:
In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under…
▽ More
In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Benchmarking Cross-Domain Audio-Visual Deception Detection
Authors:
Xiaobao Guo,
Zitong Yu,
Nithish Muthuchamy Selvaraj,
Bingquan Shen,
Adams Wai-Kin Kong,
Alex C. Kot
Abstract:
Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features d…
▽ More
Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{https://github.com/Redaimao/cross_domain_DD}{https://github.com/Redaimao/cross\_domain\_DD}.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Computation Offloading for Multi-server Multi-access Edge Vehicular Networks: A DDQN-based Method
Authors:
Siyu Wang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
Yan Zhang,
Chau Yuen
Abstract:
In this paper, we investigate a multi-user offloading problem in the overlap** domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the hum…
▽ More
In this paper, we investigate a multi-user offloading problem in the overlap** domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the human behaviour model when making the offloading decision, and then introduce a server evaluation mechanism based on both the mobility parameter and the server load to select the optimal offloading server. In order to fully utilise the server resources, we design a double deep Q-network (DDQN)-based reward evaluation algorithm that considers the priority of tasks when scheduling offload requests. Finally, numerical simulations are conducted to verify that our proposed method outperforms traditional mathematical computation methods as well as the DQN algorithm.
△ Less
Submitted 20 February, 2024;
originally announced April 2024.
-
Network-Constrained Unit Commitment with Flexible Temporal Resolution
Authors:
Zekuan Yu,
Haiwang Zhong,
Guangchun Ruan,
Xinfei Yan
Abstract:
Modern network-constrained unit commitment (NCUC) bears a heavy computational burden due to the ever-growing model scale. This situation becomes more challenging when detailed operational characteristics, complicated constraints, and multiple objectives are considered. We propose a novel simplification method to determine the flexible temporal resolution for acceleration and near-optimal solutions…
▽ More
Modern network-constrained unit commitment (NCUC) bears a heavy computational burden due to the ever-growing model scale. This situation becomes more challenging when detailed operational characteristics, complicated constraints, and multiple objectives are considered. We propose a novel simplification method to determine the flexible temporal resolution for acceleration and near-optimal solutions. The flexible temporal resolution is determined by analyzing the impact on generators in each adaptive time period with awareness of congestion effects. Additionally, multiple improvements are employed on the existing NCUC model compatible with flexible temporal resolution to reduce the number of integer variables while preserving the original features. A case study using the IEEE 118-bus and the Polish 2736-bus systems verifies that the proposed method achieves substantial acceleration with low cost variation and high accuracy.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Force-EvT: A Closer Look at Robotic Gripper Force Measurement with Event-based Vision Transformer
Authors:
Qianyu Guo,
Ziqing Yu,
Jiaming Fu,
Yawen Lu,
Yahya Zweiri,
Dongming Gan
Abstract:
Robotic grippers are receiving increasing attention in various industries as essential components of robots for interacting and manipulating objects. While significant progress has been made in the past, conventional rigid grippers still have limitations in handling irregular objects and can damage fragile objects. We have shown that soft grippers offer deformability to adapt to a variety of objec…
▽ More
Robotic grippers are receiving increasing attention in various industries as essential components of robots for interacting and manipulating objects. While significant progress has been made in the past, conventional rigid grippers still have limitations in handling irregular objects and can damage fragile objects. We have shown that soft grippers offer deformability to adapt to a variety of object shapes and maximize object protection. At the same time, dynamic vision sensors (e.g., event-based cameras) are capable of capturing small changes in brightness and streaming them asynchronously as events, unlike RGB cameras, which do not perform well in low-light and fast-moving environments. In this paper, a dynamic-vision-based algorithm is proposed to measure the force applied to the gripper. In particular, we first set up a DVXplorer Lite series event camera to capture twenty-five sets of event data. Second, motivated by the impressive performance of the Vision Transformer (ViT) algorithm in dense image prediction tasks, we propose a new approach that demonstrates the potential for real-time force estimation and meets the requirements of real-world scenarios. We extensively evaluate the proposed algorithm on a wide range of scenarios and settings, and show that it consistently outperforms recent approaches.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Environment-Aware Codebook for RIS-Assisted MU-MISO Communications: Implementation and Performance Analysis
Authors:
Zhiheng Yu,
Jiancheng An,
Lu Gan,
Chau Yuen
Abstract:
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can reshape the characteristics of wireless channels. In this paper, we propose a novel environment-aware codebook protocol for RIS-assisted multi-user multiple-input single-output (MU-MISO) systems. Specifically, we first introduce a channel training protocol which consists of off-line and on-…
▽ More
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can reshape the characteristics of wireless channels. In this paper, we propose a novel environment-aware codebook protocol for RIS-assisted multi-user multiple-input single-output (MU-MISO) systems. Specifically, we first introduce a channel training protocol which consists of off-line and on-line stages. Secondly, we propose an environment-aware codebook generation scheme, which utilizes the statistical channel state information and alternating optimization method to generate codewords offline. Then, in the on-line stage, we use these pre-designed codewords to configure the RIS, and the optimal codeword resulting in the highest sum rate is adopted for assisting in the downlink data transmission. Thirdly, we analyze the theoretical performance of the proposed protocol considering the channel estimation errors. Finally, numerical simulations are provided to verify our theoretical analysis and the performance of the proposed scheme.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations
Authors:
Xun Lin,
Yi Yu,
Song Xia,
Jue Jiang,
Haoran Wang,
Zitong Yu,
Yizhong Liu,
Ying Fu,
Shuai Wang,
Wenzhong Tang,
Alex Kot
Abstract:
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme…
▽ More
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%).
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Beamforming Design for Double-Active-RIS-aided Communication Systems with Inter-Excitation
Authors:
Boshi Wang,
Cunhua Pan,
Hong Ren,
Zhiyuan Yu,
Yang Zhang,
Mengyu Liu,
Gui Zhou
Abstract:
In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, the mutual influence between active RISs, which is termed as the "inter-ex…
▽ More
In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, the mutual influence between active RISs, which is termed as the "inter-excitation" effect, cannot be ignored. Then, we develop a feedback-type model to characterize the signal containing the inter-excitation effect. Based on the signal model, we formulate a weighted sum rate (WSR) maximization problem by jointly optimizing the beamforming matrix at the BS and the reflecting coefficient matrices at the two active RISs, subject to power constraints at the BS and active RISs, as well as the maximum amplification gain constraints of the active RISs. To solve this non-convex problem, we first transform the problem into a more tractable form using the fractional programming (FP) method. Then, by introducing auxiliary variables, the problem can be converted into an equivalent form that can be solved by using a low-complexity penalty dual decomposition (PDD) algorithm. Finally, simulation results indicate that it is crucial to consider the inter-excitation effect between active RISs in beamforming design for double-active-RIS-aided communication systems. Additionally, it prevails over other benchmark schemes with single active RIS and double passive RISs in terms of achievable rate.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Compute-first optical detection for noise-resilient visual perception
Authors:
Jungmin Kim,
Nanfang Yu,
Zongfu Yu
Abstract:
In the context of visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy and weak-signal environments such as thermal imaging for night vision applications, however, the performance of neural computing tasks faces a significant bottleneck due to…
▽ More
In the context of visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy and weak-signal environments such as thermal imaging for night vision applications, however, the performance of neural computing tasks faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception tasks, as benchmarked with the MNIST classification. Our idea is supported by a quantitative analysis detailing the relationship between signal concentration and noise robustness, as well as its practical implementation in an incoherent imaging system. This compute-first detection scheme can pave the way for advancing infrared machine vision technologies widely used for industrial and defense applications.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging
Authors:
Huafeng Li,
Zhenmei Yang,
Yafei Zhang,
Dapeng Tao,
Zhengtao Yu
Abstract:
The reconstruction of high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images in dynamic scenes presents significant challenges, especially in preserving and restoring information in oversaturated regions and avoiding ghosting artifacts. While current methods often struggle to address these challenges, our work aims to bridge this gap by develo** a multi-exposure HDR i…
▽ More
The reconstruction of high dynamic range (HDR) images from multi-exposure low dynamic range (LDR) images in dynamic scenes presents significant challenges, especially in preserving and restoring information in oversaturated regions and avoiding ghosting artifacts. While current methods often struggle to address these challenges, our work aims to bridge this gap by develo** a multi-exposure HDR image reconstruction network for dynamic scenes, complemented by single-frame HDR image reconstruction. This network, comprising single-frame HDR reconstruction with enhanced stop image (SHDR-ESI) and SHDR-ESI-assisted multi-exposure HDR reconstruction (SHDRA-MHDR), effectively leverages the ghost-free characteristic of single-frame HDR reconstruction and the detail-enhancing capability of ESI in oversaturated areas. Specifically, SHDR-ESI innovatively integrates single-frame HDR reconstruction with the utilization of ESI. This integration not only optimizes the single image HDR reconstruction process but also effectively guides the synthesis of multi-exposure HDR images in SHDR-AMHDR. In this method, the single-frame HDR reconstruction is specifically applied to reduce potential ghosting effects in multiexposure HDR synthesis, while the use of ESI images assists in enhancing the detail information in the HDR synthesis process. Technically, SHDR-ESI incorporates a detail enhancement mechanism, which includes a self-representation module and a mutual-representation module, designed to aggregate crucial information from both reference image and ESI. To fully leverage the complementary information from non-reference images, a feature interaction fusion module is integrated within SHDRA-MHDR. Additionally, a ghost suppression module, guided by the ghost-free results of SHDR-ESI, is employed to suppress the ghosting artifacts.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Reconfigurable Intelligent Surface assisted Integrated Communication, Sensing, and Computation Systems
Authors:
Jiahua Wan,
Hong Ren,
Zhiyuan Yu,
Zhenkun Zhang,
Yang Zhang,
Cunhua Pan,
Jiangzhou Wang
Abstract:
This paper studies a mobile edge computing (MEC) assisted integrated sensing and communication (ISAC), where reconfigurable intelligent surface (RIS) is used to alleviate the attenuation of communication links during computational offloading. In this paradigm, the dual function radar and communication (DFRC)-enabled user equipments (UEs) simultaneously perform radar sensing and communication tasks…
▽ More
This paper studies a mobile edge computing (MEC) assisted integrated sensing and communication (ISAC), where reconfigurable intelligent surface (RIS) is used to alleviate the attenuation of communication links during computational offloading. In this paradigm, the dual function radar and communication (DFRC)-enabled user equipments (UEs) simultaneously perform radar sensing and communication tasks, facilitating the offloading of partial computational tasks to an edge computing server (ECS) through an RIS. To satisfy the critical need for time-efficient sensing, we investigate the latency minimization problem for both single-UE and multi-UE scenarios, subject to constraints on UEs' transmit power, radar signal-to-interference-plus-noise-ratio (SINR), RIS phase shift, and computation capability. To address the formulated non-convex problem, we propose an algorithm based on the block coordinate descent (BCD) method to decouple the original problem into two subproblems, where the computational and beamforming settings are optimized alternately. Specifically, we employ the Lagrangian method for the optimization of computational settings. In addition, two equivalent transformations are applied to transform the objective function (OF) of the subproblem for active and passive beamforming into a form of weighted sum-rate. Then, a combination of minimum variance distortionless response (MVDR), successive convex approximation (SCA), majorization-minimization (MM), and weighted minimum mean-square error (WMMSE) techniques are applied to tackle this subproblem. Moreover, a closed-form solution is obtained in the simplified single UE scenario. Finally, simulation results substantiate the effectiveness of the proposed system.
△ Less
Submitted 14 March, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Authors:
Hang Zhao,
Yifei Xin,
Zhesong Yu,
Bilei Zhu,
Lu Lu,
Zejun Ma
Abstract:
In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in develo** generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instructio…
▽ More
In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in develo** generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instruction tuning. MINT leverages the strength of frozen pre-trained audio encoders and large language models (LLM) to improve audio-language pre-training, enabling effective transferablility to both audio-text understanding and generation tasks. To address the modality gap, we introduce Bridge-Net, a trainable module that enhances cross-modality alignment and the model's ability to follow instructions for a variety of audio-text tasks. Bridge-Net is pivotal within MINT, initially enhancing audio-language representation learning through a multi-target pre-training approach. Subsequently, Bridge-Net further boosts audio-to-language generative learning by integrating a frozen language model with instruction tuning. This integration empowers MINT to extract features in a flexible and effective manner, specifically tailored to the provided instructions for diverse tasks. Experimental results demonstrate that MINT attains superior performance across various audio-language understanding and generation tasks, highlighting its robust generalization capabilities even in zero-shot scenarios.
△ Less
Submitted 11 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Reconfigurable Intelligent Surface-Aided Dual-Function Radar and Communication Systems With MU-MIMO Communication
Authors:
Yasheng **,
Hong Ren,
Cunhua Pan,
Zhiyuan Yu,
Ruisong Weng,
Boshi Wang,
Gui Zhou,
Yongchao He,
Maged Elkashlan
Abstract:
In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem i…
▽ More
In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem into an equivalent one. Then, we utilize an alternating optimization (AO) { algorithm} to decouple the optimization variables and decompose this challenging problem into two subproblems. Given reflecting coefficients, a penalty-based algorithm is utilized to deal with the non-convex radar signal-to-noise ratio (SNR) constraints. For the given beamforming matrix of the BS, we apply majorization-minimization (MM) to transform the problem into a quadratic constraint quadratic programming (QCQP) problem, which is ultimately solved using a semidefinite relaxation (SDR)-based algorithm. Simulation results illustrate the advantage of deploying RIS in the considered multi-user MIMO (MU-MIMO) ISAC systems.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems
Authors:
Mengyu Liu,
Hong Ren,
Cunhua Pan,
Boshi Wang,
Zhiyuan Yu,
Ruisong Weng,
Kangda Zhi,
Yongchao He
Abstract:
Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur…
▽ More
Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigurable intelligent surface (RIS) has been introduced into RCC systems to reduce the interference. However, the "multiplicative fading" effect introduced by passive RIS limits its performance. To tackle this issue, we consider a double active RIS-assisted RCC system, which focuses on the design of the radar's beamforming vector and the active RISs' reflecting coefficient matrices, to maximize the achievable data rate of the communication system. The considered system needs to meet the radar detection constraint and the power budgets at the radar and the RISs. Since the problem is non-convex, we propose an algorithm based on the penalty dual decomposition (PDD) framework. Specifically, we initially introduce auxiliary variables to reformulate the coupled variables into equation constraints and incorporate these constraints into the objective function through the PDD framework. Then, we decouple the equivalent problem into several subproblems by invoking the block coordinate descent (BCD) method. Furthermore, we employ the Lagrange dual method to alternately optimize these subproblems. Simulation results verify the effectiveness of the proposed algorithm. Furthermore, the results also show that under the same power budget, deploying double active RISs in RCC systems can achieve higher data rate than those with single active RIS and double passive RISs.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Secure Wireless Communication in Active RIS-Assisted DFRC System
Authors:
Yang Zhang,
Hong Ren,
Cunhua Pan,
Boshi Wang,
Zhiyuan Yu,
Ruisong Weng,
Tuo Wu,
Yongchao He
Abstract:
This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-r…
▽ More
This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-ratio (SINR) constraint of the radar echo and the power consumption constraints at the DFRC-BS and active RIS. An alternating optimization (AO) algorithm based on semi-definite relaxation (SDR) and majorizationminimization (MM) is applied to solve the SR-maximization problem by alternately optimizing the beamforming matrix and the reflecting coefficients. Specifically, we first apply the SDR and successive convex approximation (SCA) methods to transform the two subproblems into more tractable forms, then the MM method is applied to derive a concave surrogate function and iteratively solve the subproblems. Finally, simulation results indicate that the active RIS can better confront the impact of "multiplicative fading" and outperforms traditional passive RIS in terms of both secure data rate and radar sensing performance.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks
Authors:
Bo Yang,
Xueyao Zhang,
Zhiwen Yu,
Xuelin Cao,
Chongwen Huang,
George C. Alexandropoulos,
Yan Zhang,
Merouane Debbah,
Chau Yuen
Abstract:
In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from sever…
▽ More
In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from severe interference, causing outages in the task offloading process. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) to enable, not only V2I reflective links, but also interference cancellation at the V2V links exploiting the computational capability of its metamaterials. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices, with the objective to maximize a safety-based autonomous driving task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems and solved via an alternate approximation method. Our simulation results demonstrate the effectiveness of the proposed RICS optimization in improving the safety in autonomous driving networks.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow
Authors:
Liguo Zhou,
Yinglei Song,
Yichao Gao,
Zhou Yu,
Michael Sodamin,
Hongshen Liu,
Liang Ma,
Lian Liu,
Hao Liu,
Yang Liu,
Haichuan Li,
Guang Chen,
Alois Knoll
Abstract:
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte…
▽ More
Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile kee** a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or Socket.IO, making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly.
△ Less
Submitted 30 January, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy
Authors:
Zihao Yu,
Fengbin Guan,
Yiting Lu,
Xin Li,
Zhibo Chen
Abstract:
The objective of non-reference video quality assessment is to evaluate the quality of distorted video without access to reference high-definition references. In this study, we introduce an enhanced spatial perception module, pre-trained on multiple image quality assessment datasets, and a lightweight temporal fusion module to address the no-reference visual quality assessment (NR-VQA) task. This m…
▽ More
The objective of non-reference video quality assessment is to evaluate the quality of distorted video without access to reference high-definition references. In this study, we introduce an enhanced spatial perception module, pre-trained on multiple image quality assessment datasets, and a lightweight temporal fusion module to address the no-reference visual quality assessment (NR-VQA) task. This model implements Swin Transformer V2 as a local-level spatial feature extractor and fuses these multi-stage representations through a series of transformer layers. Furthermore, a temporal transformer is utilized for spatiotemporal feature fusion across the video. To accommodate compressed videos of varying bitrates, we incorporate a coarse-to-fine contrastive strategy to enrich the model's capability to discriminate features from videos of different bitrates. This is an expanded version of the one-page abstract.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Authors:
Jiaxin Guo,
Minghan Wang,
Xiaosong Qiao,
Daimeng Wei,
Hengchao Shang,
Zongyao Li,
Zhengzhe Yu,
Yinglu Li,
Chang Su,
Min Zhang,
Shimin Tao,
Hao Yang
Abstract:
Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu…
▽ More
Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tuning on Original Paired Data, the source side data must be transcribed by a well-trained ASR model, which takes a lot of time and not universal. In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. UCorrect has no dependency on the training data mentioned before. The whole procedure is first to detect whether the character is erroneous, then to generate some candidate characters and finally to select the most confident one to replace the error character. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect for ASR error correction: 1) it achieves significant WER reduction, achieves 6.83\% even without fine-tuning and 14.29\% after fine-tuning; 2) it outperforms the popular NAR correction models by a large margin with a competitive low latency; and 3) it is an universal method, as it reduces all WERs of the ASR model with different decoding strategies and reduces all WERs of ASR models trained on different scale datasets.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Prompt-driven Latent Domain Generalization for Medical Image Classification
Authors:
Siyuan Yan,
Chi Liu,
Zhen Yu,
Lie Ju,
Dwarikanath Mahapatra,
Brigid Betz-Stablein,
Victoria Mar,
Monika Janda,
Peter Soyer,
Zongyuan Ge
Abstract:
Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifacts bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve…
▽ More
Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifacts bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a novel DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code will be publicly available upon the paper is accepted.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Non-Cartesian Self-Supervised Physics-Driven Deep Learning Reconstruction for Highly-Accelerated Multi-Echo Spiral fMRI
Authors:
Hongyi Gu,
Chi Zhang,
Zidan Yu,
Christoph Rettenmeier,
V. Andrew Stenger,
Mehmet Akçakaya
Abstract:
Functional MRI (fMRI) is an important tool for non-invasive studies of brain function. Over the past decade, multi-echo fMRI methods that sample multiple echo times has become popular with potential to improve quantification. While these acquisitions are typically performed with Cartesian trajectories, non-Cartesian trajectories, in particular spiral acquisitions, hold promise for denser sampling…
▽ More
Functional MRI (fMRI) is an important tool for non-invasive studies of brain function. Over the past decade, multi-echo fMRI methods that sample multiple echo times has become popular with potential to improve quantification. While these acquisitions are typically performed with Cartesian trajectories, non-Cartesian trajectories, in particular spiral acquisitions, hold promise for denser sampling of echo times. However, such acquisitions require very high acceleration rates for sufficient spatiotemporal resolutions. In this work, we propose to use a physics-driven deep learning (PD-DL) reconstruction to accelerate multi-echo spiral fMRI by 10-fold. We modify a self-supervised learning algorithm for optimized training with non-Cartesian trajectories and use it to train the PD-DL network. Results show that the proposed self-supervised PD-DL reconstruction achieves high spatio-temporal resolution with meaningful BOLD analysis.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
Authors:
Meiyue Song,
Zhihua Yu,
Jiaxin Wang,
Jiarui Wang,
Yuting Lu,
Baicun Li,
Xiaoxu Wang,
Qinghua Huang,
Zhijun Li,
Nikolaos I. Kanellakis,
Jiangfeng Liu,
**g Wang,
Binglu Wang,
Juntao Yang
Abstract:
The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layer…
▽ More
The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision-language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs' efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods and the effectiveness of proposed modules. Our codes can be found at https://github.com/CodeMonsterPHD/PneumoLLM/tree/main.
△ Less
Submitted 28 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
On the Maximization of Long-Run Reward CVaR for Markov Decision Processes
Authors:
Li Xia,
Zhihui Yu,
Peter W. Glynn
Abstract:
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directio…
▽ More
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directions, we prove that the maximum of long-run CVaR of MDPs over the set of history-dependent randomized policies can be found within the class of stationary randomized policies. In contrast to classical MDPs, we find that there may not exist an optimal stationary deterministic policy for maximizing CVaR. Instead, we prove the existence of an optimal stationary randomized policy that requires randomizing over at most two actions. Via a convex optimization representation of CVaR, we convert the long-run CVaR maximization MDP into a minimax problem, where we prove the interchangeability of minimum and maximum and the related existence of saddle point solutions. Furthermore, we propose an algorithm that finds the saddle point solution by solving two linear programs. These results are then extended to objectives that involve maximizing some combination of mean and CVaR of rewards simultaneously. Finally, we conduct numerical experiments to demonstrate the main results.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Neural-Optic Co-Designed Polarization-Multiplexed Metalens for Compact Computational Spectral Imaging
Authors:
Qiangbo Zhang,
Peicheng Lin,
Chang Wang,
Yang Zhang,
Zeqing Yu,
Xinyu Liu,
Ting Xu,
Zhenrong Zheng
Abstract:
As the realm of spectral imaging applications extends its reach into the domains of mobile technology and augmented reality, the demands for compact yet high-fidelity systems become increasingly pronounced. Conventional methodologies, exemplified by coded aperture snapshot spectral imaging systems, are significantly limited by their cumbersome physical dimensions and form factors. To address this…
▽ More
As the realm of spectral imaging applications extends its reach into the domains of mobile technology and augmented reality, the demands for compact yet high-fidelity systems become increasingly pronounced. Conventional methodologies, exemplified by coded aperture snapshot spectral imaging systems, are significantly limited by their cumbersome physical dimensions and form factors. To address this inherent challenge, diffractive optical elements (DOEs) have been repeatedly employed as a means to mitigate issues related to the bulky nature of these systems. Nonetheless, it's essential to note that the capabilities of DOEs primarily revolve around the modulation of the phase of light. Here, we introduce an end-to-end computational spectral imaging framework based on a polarization-multiplexed metalens. A distinguishing feature of this approach lies in its capacity to simultaneously modulate orthogonal polarization channels. When harnessed in conjunction with a neural network, it facilitates the attainment of high-fidelity spectral reconstruction. Importantly, the framework is intrinsically fully differentiable, a feature that permits the joint optimization of both the metalens structure and the parameters governing the neural network. The experimental results presented herein validate the exceptional spatial-spectral reconstruction performance, underscoring the efficacy of this system in practical, real-world scenarios. This innovative approach transcends the traditional boundaries separating hardware and software in the realm of computational imaging and holds the promise of substantially propelling the miniaturization of spectral imaging systems.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
HiFi-Syn: Hierarchical Granularity Discrimination for High-Fidelity Synthesis of MR Images with Structure Preservation
Authors:
Ziqi Yu,
Botao Zhao,
Shengjie Zhang,
Xiang Chen,
Jianfeng Feng,
Tingying Peng,
Xiao-Yong Zhang
Abstract:
Synthesizing medical images while preserving their structural information is crucial in medical research. In such scenarios, the preservation of anatomical content becomes especially important. Although recent advances have been made by incorporating instance-level information to guide translation, these methods overlook the spatial coherence of structural-level representation and the anatomical i…
▽ More
Synthesizing medical images while preserving their structural information is crucial in medical research. In such scenarios, the preservation of anatomical content becomes especially important. Although recent advances have been made by incorporating instance-level information to guide translation, these methods overlook the spatial coherence of structural-level representation and the anatomical invariance of content during translation. To address these issues, we introduce hierarchical granularity discrimination, which exploits various levels of semantic information present in medical images. Our strategy utilizes three levels of discrimination granularity: pixel-level discrimination using a Brain Memory Bank, structure-level discrimination on each brain structure with a re-weighting strategy to focus on hard samples, and global-level discrimination to ensure anatomical consistency during translation. The image translation performance of our strategy has been evaluated on three independent datasets (UK Biobank, IXI, and BraTS 2018), and it has outperformed state-of-the-art algorithms. Particularly, our model excels not only in synthesizing normal structures but also in handling abnormal (pathological) structures, such as brain tumors, despite the variations in contrast observed across different imaging modalities due to their pathological characteristics. The diagnostic value of synthesized MR images containing brain tumors has been evaluated by radiologists. This indicates that our model may offer an alternative solution in scenarios where specific MR modalities of patients are unavailable. Extensive experiments further demonstrate the versatility of our method, providing unique insights into medical image translation.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning
Authors:
Yao Zhang,
Zhiwen Yu,
Jun Zhang,
Liang Wang,
Tom H. Luan,
Bin Guo,
Chau Yuen
Abstract:
This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcem…
▽ More
This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcement Learning (MARL) is a promising solution. Nevertheless, existing MARL algorithms ignore effective information aggregation which is fundamental for improving the learning capacity of decentralized agents. In this paper, we design a new decentralized control architecture with improved environmental observability to capture the spatial-temporal correlation. Specifically, we first develop a topology-aware information aggregation strategy to extract correlation-related information from unstructured data gathered in the road network. Particularly, we transfer the road network topology into a graph shift operator by forming a diffusion process on the topology, which subsequently facilitates the construction of graph signals. A diffusion convolution module is developed, forming a new MARL algorithm, which endows agents with the capabilities of graph learning. Extensive experiments based on both synthetic and real-world datasets verify that our proposal outperforms existing decentralized algorithms.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
INeAT: Iterative Neural Adaptive Tomography
Authors:
Bo Xiong,
Changqing Su,
Zihan Lin,
You Zhou,
Zhaofei Yu
Abstract:
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditio…
▽ More
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditional methods. However, it still faces challenges when dealing with the substantial perturbations and pose shifts encountered in CT scanning processes. Here, we propose a neural rendering method for CT reconstruction, named Iterative Neural Adaptive Tomography (INeAT), which incorporates iterative posture optimization to effectively counteract the influence of posture perturbations in data, particularly in cases involving significant posture variations. Through the implementation of a posture feedback optimization strategy, INeAT iteratively refines the posture corresponding to the input images based on the reconstructed 3D volume. We demonstrate that INeAT achieves artifact-suppressed and resolution-enhanced reconstruction in scenarios with significant pose disturbances. Furthermore, we show that our INeAT maintains comparable reconstruction performance to stable-state acquisitions even using data from unstable-state acquisitions, which significantly reduces the time required for CT scanning and relaxes the stringent requirements on imaging hardware systems, underscoring its immense potential for applications in short-time and low-cost CT technology.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Numerical Derivative-based Flexible Integration Algorithm for Power Electronic Systems Simulation Considering Nonlinear Components
Authors:
Han Xu,
Bochen Shi,
Zhujun Yu,
Jialin Zheng,
Zhengming Zhao
Abstract:
Simulation is an efficient tool in the design and control of power electronic systems. However, quick and accurate simulation of them is still challenging, especially when the system contains a large number of switches and state variables. Conventional general-purpose integration algorithms assume nonlinearity within systems but face inefficiency in handling the piecewise characteristics of power…
▽ More
Simulation is an efficient tool in the design and control of power electronic systems. However, quick and accurate simulation of them is still challenging, especially when the system contains a large number of switches and state variables. Conventional general-purpose integration algorithms assume nonlinearity within systems but face inefficiency in handling the piecewise characteristics of power electronic switches. While some specialized algorithms can adapt to the piecewise characteristics, most of these methods require systems to be piecewise linear. In this article, a numerical derivative-based flexible integration algorithm is proposed. This algorithm can adapt to the piecewise characteristic caused by switches and have no difficulty when nonlinear non-switching components are present in the circuit. This algorithm consists of a recursive numerical scheme that obtains high-order time derivatives of nonlinear components and a decoupling strategy that further increases computational efficiency. The proposed method is applied to solve a motor derive system and a large-scale power conversion system (PCS) to verify its accuracy and efficiency by comparing experimental waveforms and simulated results given by commercial software. Our proposed method demonstrates several-fold acceleration compared to multiple commonly used algorithms in Simulink.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Joint Music and Language Attention Models for Zero-shot Music Tagging
Authors:
Xingjian Du,
Zhesong Yu,
Jiaju Lin,
Bilei Zhu,
Qiuqiang Kong
Abstract:
Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio…
▽ More
Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio encoder modeled by a pretrained masked autoencoder and a decoder modeled by a Falcon7B. We introduce preceiver resampler to convert arbitrary length audio into fixed length embeddings. We introduce dense attention connections between encoder and decoder layers to improve the information flow between the encoder and decoder layers. We collect a large-scale music and description dataset from the internet. We propose to use ChatGPT to convert the raw descriptions into formalized and diverse descriptions to train the JMLA models. Our proposed JMLA system achieves a zero-shot audio tagging accuracy of $ 64.82\% $ on the GTZAN dataset, outperforming previous zero-shot systems and achieves comparable results to previous systems on the FMA and the MagnaTagATune datasets.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Indoor Positioning based on Active Radar Sensing and Passive Reflectors: Concepts & Initial Results
Authors:
Pascal Schlachter,
Zhibin Yu,
Naveed Iqbal,
Xiaofeng Wu,
Sven Hinderer,
Bin Yang
Abstract:
To navigate reliably in indoor environments, an industrial autonomous vehicle must know its position. However, current indoor vehicle positioning technologies either lack accuracy, usability or are too expensive. Thus, we propose a novel concept called local reference point assisted active radar positioning, which is able to overcome these drawbacks. It is based on distributing passive retroreflec…
▽ More
To navigate reliably in indoor environments, an industrial autonomous vehicle must know its position. However, current indoor vehicle positioning technologies either lack accuracy, usability or are too expensive. Thus, we propose a novel concept called local reference point assisted active radar positioning, which is able to overcome these drawbacks. It is based on distributing passive retroreflectors in the indoor environment such that each position of the vehicle can be identified by a unique reflection characteristic regarding the reflectors. To observe these characteristics, the autonomous vehicle is equipped with an active radar system. On one hand, this paper presents the basic idea and concept of our new approach towards indoor vehicle positioning and especially focuses on the crucial placement of the reflectors. On the other hand, it also provides a proof of concept by conducting a full system simulation including the placement of the local reference points, the radar-based distance estimation and the comparison of two different positioning methods. It successfully demonstrates the feasibility of our proposed approach.
△ Less
Submitted 31 January, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
SSHNN: Semi-Supervised Hybrid NAS Network for Echocardiographic Image Segmentation
Authors:
Renqi Chen,
**g**g Luo,
Fan Nian,
Yuhui Cen,
Yiheng Peng,
Zekuan Yu
Abstract:
Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak de…
▽ More
Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak decoder" structure, insufficient to handle global relationships and local details. To resolve these issues, we propose a novel semi-supervised hybrid NAS network for accurate medical image segmentation termed SSHNN. In SSHNN, we creatively use convolution operation in layer-wise feature fusion instead of normalized scalars to avoid losing details, making NAS a stronger encoder. Moreover, Transformers are introduced for the compensation of global context and U-shaped decoder is designed to efficiently connect global context with local features. Specifically, we implement a semi-supervised algorithm Mean-Teacher to overcome the limited volume problem of labeled medical image dataset. Extensive experiments on CAMUS echocardiography dataset demonstrate that SSHNN outperforms state-of-the-art approaches and realizes accurate segmentation. Code will be made publicly available.
△ Less
Submitted 27 December, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Task Offloading Optimization in Mobile Edge Computing under Uncertain Processing Cycles and Intermittent Communications
Authors:
Tao Deng,
Zhanwei Yu,
Di Yuan
Abstract:
Mobile edge computing (MEC) has been regarded as a promising approach to deal with explosive computation requirements by enabling cloud computing capabilities at the edge of networks. Existing models of MEC impose some strong assumptions on the known processing cycles and unintermittent communications. However, practical MEC systems are constrained by various uncertainties and intermittent communi…
▽ More
Mobile edge computing (MEC) has been regarded as a promising approach to deal with explosive computation requirements by enabling cloud computing capabilities at the edge of networks. Existing models of MEC impose some strong assumptions on the known processing cycles and unintermittent communications. However, practical MEC systems are constrained by various uncertainties and intermittent communications, rendering these assumptions impractical. In view of this, we investigate how to schedule task offloading in MEC systems with uncertainties. First, we derive a closed-form expression of the average offloading success probability in a device-to-device (D2D) assisted MEC system with uncertain computation processing cycles and intermittent communications. Then, we formulate a task offloading maximization problem (TOMP), and prove that the problem is NP-hard. For problem solving, if the problem instance exhibits a symmetric structure, we propose a task scheduling algorithm based on dynamic programming (TSDP). By solving this problem instance, we derive a bound to benchmark sub-optimal algorithm. For general scenarios, by reformulating the problem, we propose a repeated matching algorithm (RMA). Finally, in performance evaluations, we validate the accuracy of the closed-form expression of the average offloading success probability by Monte Carlo simulations, as well as the effectiveness of the proposed algorithms.
△ Less
Submitted 7 October, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading
Authors:
Ruihuai Liang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
Derrick Wing Kwan Ng,
Chau Yuen
Abstract:
Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether o…
▽ More
Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether offloading or not) and computational resource allocation of MEC. The design can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is generally NP-hard and its effective solution can be obtained by performing online inference through a well-trained deep neural network (DNN) model. However, when the system environments change dynamically, the DNN model may lose efficacy due to the drift of input parameters, thereby decreasing the generalization ability of the DNN model. To address this unique challenge, in this paper, we propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs). Specifically, the shared backbone will be invariant during the PHs training and the inferred results will be ensembled, thereby significantly reducing the required training overhead and improving the inference performance. As a result, the joint optimization problem for offloading decision and resource allocation can be efficiently solved even in a time-varying wireless environment. Experimental results show that the proposed MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Multi-Stage Expansion Planning for Decarbonizing Thermal Generation Supported Renewable Power Systems Using Hydrogen and Ammonia Storage
Authors:
Zhipeng Yu,
** Lin,
Feng Liu,
Jiarong Li,
Yingtian Chi,
Yonghua Song,
Zhengwei Ren
Abstract:
Large-scale centralized development of wind and solar energy and peer-to-grid transmission of renewable energy source (RES) via high voltage direct current (HVDC) has been regarded as one of the most promising ways to achieve goals of peak carbon and carbon neutrality in China. Traditionally, large-scale thermal generation is needed to economically support the load demand of HVDC with a given prof…
▽ More
Large-scale centralized development of wind and solar energy and peer-to-grid transmission of renewable energy source (RES) via high voltage direct current (HVDC) has been regarded as one of the most promising ways to achieve goals of peak carbon and carbon neutrality in China. Traditionally, large-scale thermal generation is needed to economically support the load demand of HVDC with a given profile, which in turn raises concerns about carbon emissions. To address the issues above, hydrogen energy storage system (HESS) and ammonia energy storage system (AESS) are introduced to gradually replace thermal generation, which is represented as a multi-stage expansion planning (MSEP) problem. Specifically, first, HESS and AESS are established in the MSEP model with carbon emission reduction constraints, and yearly data with hourly time resolution are utilized for each stage to well describe the intermittence of RES. Then, a combined Dantzig-Wolfe decomposition (DWD) and column generation (CG) solution approach is proposed to efficiently solve the large-scale MSEP model. Finally, a real-life system in China is studied. The results indicate that HESS and AESS have the potential to handle the intermittence of RES, as well as the monthly imbalance between RES and load demand. Especially under the goal of carbon neutrality, the contribution of HESS and AESS in reducing levelized cost of energy (LCOE) reaches 12.28% and 14.59%, respectively, which finally leads to a LCOE of 0.4324 RMB/kWh.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios
Authors:
Yu-Wen Chen,
Zhou Yu,
Julia Hirschberg
Abstract:
Pronunciation assessment models designed for open response scenarios enable users to practice language skills in a manner similar to real-life communication. However, previous open-response pronunciation assessment models have predominantly focused on a single pronunciation task, such as sentence-level accuracy, rather than offering a comprehensive assessment in various aspects. We propose MultiPA…
▽ More
Pronunciation assessment models designed for open response scenarios enable users to practice language skills in a manner similar to real-life communication. However, previous open-response pronunciation assessment models have predominantly focused on a single pronunciation task, such as sentence-level accuracy, rather than offering a comprehensive assessment in various aspects. We propose MultiPA, a Multitask Pronunciation Assessment model that provides sentence-level accuracy, fluency, prosody, and word-level accuracy assessment for open responses. We examined the correlation between different pronunciation tasks and showed the benefits of multi-task learning. Our model reached the state-of-the-art performance on existing in-domain data sets and effectively generalized to an out-of-domain dataset that we newly collected. The experimental results demonstrate the practical utility of our model in real-world applications.
△ Less
Submitted 4 June, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection
Authors:
**bo Hu,
Yin Cao,
Ming Wu,
Feiran Yang,
Ziying Yu,
Wenwu Wang,
Mark D. Plumbley,
Jun Yang
Abstract:
For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such as different sizes of rooms, different reverberation times, and different background noise, may be reasons for a learning-based system to fail. On the…
▽ More
For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such as different sizes of rooms, different reverberation times, and different background noise, may be reasons for a learning-based system to fail. On the other hand, acquiring annotated spatial sound event samples, which include onset and offset time stamps, class types of sound events, and direction-of-arrival (DOA) of sound sources is very expensive. In addition, deploying a SELD system in a new environment often poses challenges due to time-consuming training and fine-tuning processes. To address these issues, we propose Meta-SELD, which applies meta-learning methods to achieve fast adaptation to new environments. More specifically, based on Model Agnostic Meta-Learning (MAML), the proposed Meta-SELD aims to find good meta-initialized parameters to adapt to new environments with only a small number of samples and parameter updating iterations. We can then quickly adapt the meta-trained SELD model to unseen environments. Our experiments compare fine-tuning methods from pre-trained SELD models with our Meta-SELD on the Sony-TAU Realistic Spatial Soundscapes 2023 (STARSSS23) dataset. The evaluation results demonstrate the effectiveness of Meta-SELD when adapting to new environments.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
QoE-Driven Video Transmission: Energy-Efficient Multi-UAV Network Optimization
Authors:
Kesong Wu,
Xianbin Cao,
Peng Yang,
Zongyang Yu,
Dapeng Oliver Wu,
Tony Q. S. Quek
Abstract:
This paper is concerned with the issue of improving video subscribers' quality of experience (QoE) by deploying a multi-unmanned aerial vehicle (UAV) network. Different from existing works, we characterize subscribers' QoE by video bitrates, latency, and frame freezing and propose to improve their QoE by energy-efficiently and dynamically optimizing the multi-UAV network in terms of serving UAV se…
▽ More
This paper is concerned with the issue of improving video subscribers' quality of experience (QoE) by deploying a multi-unmanned aerial vehicle (UAV) network. Different from existing works, we characterize subscribers' QoE by video bitrates, latency, and frame freezing and propose to improve their QoE by energy-efficiently and dynamically optimizing the multi-UAV network in terms of serving UAV selection, UAV trajectory, and UAV transmit power. The dynamic multi-UAV network optimization problem is formulated as a challenging sequential-decision problem with the goal of maximizing subscribers' QoE while minimizing the total network power consumption, subject to some physical resource constraints. We propose a novel network optimization algorithm to solve this challenging problem, in which a Lyapunov technique is first explored to decompose the sequential-decision problem into several repeatedly optimized sub-problems to avoid the curse of dimensionality. To solve the sub-problems, iterative and approximate optimization mechanisms with provable performance guarantees are then developed. Finally, we design extensive simulations to verify the effectiveness of the proposed algorithm. Simulation results show that the proposed algorithm can effectively improve the QoE of subscribers and is 66.75\% more energy-efficient than benchmarks.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
3D Medical Image Segmentation based on multi-scale MPU-Net
Authors:
Zeqiu. Yu,
Shuo. Han,
Ziheng. Song
Abstract:
The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation…
▽ More
The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation is problematic due to the irregular stereo structure of 3D volume organs. As a basic model for this class of real applications, U-Net excels. It can learn certain global and local features, but still lacks the capacity to grasp spatial long-range relationships and contextual information at multiple scales. This paper proposes a tumor segmentation model MPU-Net for patient volume CT images, which is inspired by Transformer with a global attention mechanism. By combining image serialization with the Position Attention Module, the model attempts to comprehend deeper contextual dependencies and accomplish precise positioning. Each layer of the decoder is also equipped with a multi-scale module and a cross-attention mechanism. The capability of feature extraction and integration at different levels has been enhanced, and the hybrid loss function developed in this study can better exploit high-resolution characteristic information. Moreover, the suggested architecture is tested and evaluated on the Liver Tumor Segmentation Challenge 2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results. The dice, accuracy, precision, specificity, IOU, and MCC metrics for the best model segmentation results are 92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding indicators in various aspects illustrate the exceptional performance of this framework in automatic medical image segmentation.
△ Less
Submitted 24 July, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Authors:
Zhongzhi Yu,
Yang Zhang,
Kaizhi Qian,
Yonggan Fu,
Yingyan Lin
Abstract:
Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while…
▽ More
Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues. Inspired by recent findings, we hypothesize that we can address the above challenges with modules widely shared across languages. To this end, we propose an ASR framework, dubbed \METHODNS, that, \textit{for the first time}, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. Specifically, \METHOD learns a small set of generalizable sub-modules and adaptively assembles them for different languages to reduce the multilingual overhead and enable effective knowledge transfer for low-resource adaptation. Extensive experiments and visualizations demonstrate that \METHOD can effectively discover language similarity and improve multilingual and low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under multilingual-ASR, our framework achieves a 0.13$\sim$2.41 lower character error rate (CER) with 30\% smaller inference overhead over SOTA solutions on multilingual ASR and a comparable CER, with nearly 50 times fewer trainable parameters over SOTA solutions on low-resource tuning, respectively.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Robust Divergence Angle for Inter-satellite Laser Communications under Target Deviation Uncertainty
Authors:
Zhanwei Yu,
Yi Zhao,
Di Yuan
Abstract:
Performance degradation due to target deviation by, for example, drift or jitter, presents a significant issue to inter-satellite laser communications. In particular, with periodic acquisition for positioning the satellite receiver, deviation may arise in the time period between two consecutive acquisition operations. One solution to mitigate the issue is to use a divergence angle at the transmitt…
▽ More
Performance degradation due to target deviation by, for example, drift or jitter, presents a significant issue to inter-satellite laser communications. In particular, with periodic acquisition for positioning the satellite receiver, deviation may arise in the time period between two consecutive acquisition operations. One solution to mitigate the issue is to use a divergence angle at the transmitter being wider than that if the receiver position is perfectly known. However, as how the deviation would vary over time is generally very hard to predict or model, there is no clear clue for setting the divergence angle. We propose a robust optimization approach to the problem, with the advantage that no distribution of the deviation need to be modelled. Instead, a so-called uncertainty set (often defined in form of a convex set such as a polytope) is used, where each element represents a possible scenario, i.e., a sequence of deviation values over time. Robust optimization seeks the solution that maximizes the performance (e.g., sum rate) that can be guaranteed, no matter which scenario in the uncertainty set materializes. To solve the robust optimization problem, we deploy a process of alternately solving a decision maker's problem and an adversarial problem. The former optimizes the divergence angle for a subset of the uncertainty set, whereas the latter is used to explore if the subset needs to be augmented. Simulation results show the approach leads to significantly more robust performance than using the divergence angle as if there is no deviation, or other ad-hoc schemes.
△ Less
Submitted 13 May, 2023;
originally announced June 2023.
-
DEMIST: A deep-learning-based task-specific denoising approach for myocardial perfusion SPECT
Authors:
Md Ashequr Rahman,
Zitong Yu,
Richard Laforest,
Craig K. Abbey,
Barry A. Siegel,
Abhinav K. Jha
Abstract:
There is an important need for methods to process myocardial perfusion imaging (MPI) SPECT images acquired at lower radiation dose and/or acquisition time such that the processed images improve observer performance on the clinical task of detecting perfusion defects. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose…
▽ More
There is an important need for methods to process myocardial perfusion imaging (MPI) SPECT images acquired at lower radiation dose and/or acquisition time such that the processed images improve observer performance on the clinical task of detecting perfusion defects. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose a Detection task-specific deep-learning-based approach for denoising MPI SPECT images (DEMIST). The approach, while performing denoising, is designed to preserve features that influence observer performance on detection tasks. We objectively evaluated DEMIST on the task of detecting perfusion defects using a retrospective study with anonymized clinical data in patients who underwent MPI studies across two scanners (N = 338). The evaluation was performed at low-dose levels of 6.25%, 12.5% and 25% and using an anthropomorphic channelized Hotelling observer. Performance was quantified using area under the receiver operating characteristics curve (AUC). Images denoised with DEMIST yielded significantly higher AUC compared to corresponding low-dose images and images denoised with a commonly used task-agnostic DL-based denoising method. Similar results were observed with stratified analysis based on patient sex and defect type. Additionally, DEMIST improved visual fidelity of the low-dose images as quantified using root mean squared error and structural similarity index metric. A mathematical analysis revealed that DEMIST preserved features that assist in detection tasks while improving the noise properties, resulting in improved observer performance. The results provide strong evidence for further clinical evaluation of DEMIST to denoise low-count images in MPI SPECT.
△ Less
Submitted 25 October, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review
Authors:
Chuang Zhu,
Shengjie Liu,
Zekuan Yu,
Feng Xu,
Arpit Aggarwal,
Germán Corredor,
Anant Madabhushi,
Qixun Qu,
Hongwei Fan,
Fangda Li,
Yueheng Li,
Xianchao Guan,
Yongbing Zhang,
Vivek Kumar Singh,
Farhan Akram,
Md. Mostafa Kamal Sarker,
Zhongyue Shi,
Mulan **
Abstract:
For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direct…
▽ More
For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direction. Therefore, we held the breast cancer immunohistochemical image generation challenge, aiming to explore novel ideas of deep learning technology in pathological image generation and promote research in this field. The challenge provided registered H&E and IHC-stained image pairs, and participants were required to use these images to train a model that can directly generate IHC-stained images from corresponding H&E-stained images. We selected and reviewed the five highest-ranking methods based on their PSNR and SSIM metrics, while also providing overviews of the corresponding pipelines and implementations. In this paper, we further analyze the current limitations in the field of breast cancer immunohistochemical image generation and forecast the future development of this field. We hope that the released dataset and the challenge will inspire more scholars to jointly study higher-quality IHC-stained image generation.
△ Less
Submitted 22 September, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.