-
Adaptive Self-Supervised Consistency-Guided Diffusion Model for Accelerated MRI Reconstruction
Authors:
Mojtaba Safari,
Zach Eidex,
Shaoyan Pan,
Richard L. J. Qiu,
Xiaofeng Yang
Abstract:
Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative…
▽ More
Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) T1 maps from 318 cases to train and test our model. Robustness against domain shift was evaluated using two out-of-distribution (OOD) datasets: multi-coil brain axial postcontrast T1 -weighted (T1c) dataset from 50 cases and axial T1-weighted (T1-w) dataset from 50 patients. Data were retrospectively subsampled at acceleration rates R in {2x, 4x, 8x}. ASSCGD partitions a random sampling pattern into two disjoint sets, ensuring data consistency during training. We compared our method with ReconFormer Transformer and SS-MRI, assessing performance using normalized mean squared error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Statistical tests included one-way analysis of variance (ANOVA) and multi-comparison Tukey's Honesty Significant Difference (HSD) tests. Results: ASSCGD preserved fine structures and brain abnormalities visually better than comparative methods at R = 8x for both multi-coil and single-coil datasets. It achieved the lowest NMSE at R in {4x, 8x}, and the highest PSNR and SSIM values at all acceleration rates for the multi-coil dataset. Similar trends were observed for the single-coil dataset, though SSIM values were comparable to ReconFormer at R in {2x, 8x}. These results were further confirmed by the voxel-wise correlation scatter plots. OOD results showed significant (p << 10^-5 ) improvements in undersampled image quality after reconstruction.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms
Authors:
Yifei Chen,
Zhu Zhu,
Shenghao Zhu,
Linwei Qiu,
Binfeng Zou,
Fan Jia,
Yunpeng Zhu,
Chenyan Zhang,
Zhaojie Fang,
Feiwei Qin,
** Fan,
Changmiao Wang,
Yu Gao,
Gang Yu
Abstract:
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund…
▽ More
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Secure Communications in Near-Filed ISCAP Systems with Extremely Large-Scale Antenna Arrays
Authors:
Zixiang Ren,
Siyao Zhang,
Xinmin Li,
Ling Qiu,
Jie Xu,
Derrick Wing Kwan Ng
Abstract:
This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy r…
▽ More
This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy receivers (ERs). It is assumed that the ERs and the sensing target are potential eavesdroppers that may attempt to intercept the confidential messages intended for the CU. We consider the joint transmit beamforming design to support secure communications while ensuring the sensing and powering requirements. In particular, the BS transmits dedicated sensing/energy beams in addition to the information beam, which also play the role of artificial noise (AN) for effectively jamming potential eavesdroppers. Building upon this, we maximize the secrecy rate at the CU, subject to the maximum \ac{crb} constraints for target sensing and the minimum harvested energy constraints for the ERs. Although the formulated joint beamforming problem is non-convex and challenging to solve, we acquire the optimal solution via the semi-definite relaxation (SDR) and fractional programming techniques together with a one-dimensional (1D) search. Subsequently, we present two alternative designs based on zero-forcing (ZF) beamforming and maximum ratio transmission (MRT), respectively. Finally, our numerical results show that our proposed approaches exploit both the distance-domain resolution of near-field ELAA and the joint beamforming design for enhancing secure communication performance while ensuring the sensing and powering requirements in ISCAP, especially when the CU and the target and ER eavesdroppers are located at the same angle (but different distances) with respect to the BS.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography
Authors:
Xuxin Chen,
Yuheng Li,
Mingzhe Hu,
Ella Salari,
Xiaoqian Chen,
Richard L. J. Qiu,
Bin Zheng,
Xiaofeng Yang
Abstract:
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-tr…
▽ More
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for develo** next-generation, image-text-based CAD schemes of breast cancer.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Feedback Stability Under Mixed Gain and Phase Uncertainty
Authors:
Jia** Liang,
Di Zhao,
Li Qiu
Abstract:
In this study, we investigate the robust feedback stability problem for multiple-input-multiple-output linear time-invariant systems involving sectored-disk uncertainty, namely, dynamic uncertainty subject to simultaneous gain and phase constraints. This problem is thereby called a sectored-disk problem. Employing a frequency-wise analysis approach, we derive a fundamental static matrix problem th…
▽ More
In this study, we investigate the robust feedback stability problem for multiple-input-multiple-output linear time-invariant systems involving sectored-disk uncertainty, namely, dynamic uncertainty subject to simultaneous gain and phase constraints. This problem is thereby called a sectored-disk problem. Employing a frequency-wise analysis approach, we derive a fundamental static matrix problem that serves as a key component in addressing the feedback stability. The study of this matrix problem heavily relies on the Davis-Wielandt (DW) shells of matrices, providing a profound insight into matrices subjected to simultaneous gain and phase constraints. This understanding is pivotal for establishing a less conservative sufficient condition for the matrix sectored-disk problem, from which we formulate several robust feedback stability conditions against sectored-disk uncertainty. Finally, several conditions based on linear matrix inequalities are developed for efficient computation and verification of feedback robust stability against sectored-disk uncertainty.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering
Authors:
Zeyu Hao,
Yuan Fang,
Xianghao Yu,
Jie Xu,
Ling Qiu,
Lexi Xu,
Shuguang Cui
Abstract:
This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy…
▽ More
This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy receivers (ERs) at the same time. To facilitate the energy-efficient design, we present a novel HAD architecture for the BS transmitter, which allows dynamic on-off control of its radio frequency (RF) chains and analog phase shifters (PSs) through a switch network. We also consider a practical and comprehensive power consumption model for the BS, by taking into account the power-dependent non-linear power amplifier (PA) efficiency, and the on-off non-transmission power consumption model of RF chains and PSs. We jointly design the hybrid beamforming and dynamic on-off control at the BS, aiming to minimize its total power consumption, while guaranteeing the performance requirements on communication rates, sensing Cramér-Rao bound (CRB), and harvested power levels. The formulation also takes into consideration the per-antenna transmit power constraint and the constant modulus constraints for the analog beamformer at the BS. The resulting optimization problem for ISCAP is highly non-convex. Please refer to the paper for a complete abstract.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Vision Transformer-based Multimodal Feature Fusion Network for Lymphoma Segmentation on PET/CT Images
Authors:
Huan Huang,
Liheng Qiu,
Shenmiao Yang,
Longxi Li,
Jiaofen Nan,
Yanting Li,
Chuang Han,
Fubao Zhu,
Chen Zhao,
Weihua Zhou
Abstract:
Background: Diffuse large B-cell lymphoma (DLBCL) segmentation is a challenge in medical image analysis. Traditional segmentation methods for lymphoma struggle with the complex patterns and the presence of DLBCL lesions. Objective: We aim to develop an accurate method for lymphoma segmentation with 18F-Fluorodeoxyglucose positron emission tomography (PET) and computed tomography (CT) images. Metho…
▽ More
Background: Diffuse large B-cell lymphoma (DLBCL) segmentation is a challenge in medical image analysis. Traditional segmentation methods for lymphoma struggle with the complex patterns and the presence of DLBCL lesions. Objective: We aim to develop an accurate method for lymphoma segmentation with 18F-Fluorodeoxyglucose positron emission tomography (PET) and computed tomography (CT) images. Methods: Our lymphoma segmentation approach combines a vision transformer with dual encoders, adeptly fusing PET and CT data via multimodal cross-attention fusion (MMCAF) module. In this study, PET and CT data from 165 DLBCL patients were analyzed. A 5-fold cross-validation was employed to evaluate the performance and generalization ability of our method. Ground truths were annotated by experienced nuclear medicine experts. We calculated the total metabolic tumor volume (TMTV) and performed a statistical analysis on our results. Results: The proposed method exhibited accurate performance in DLBCL lesion segmentation, achieving a Dice similarity coefficient of 0.9173$\pm$0.0071, a Hausdorff distance of 2.71$\pm$0.25mm, a sensitivity of 0.9462$\pm$0.0223, and a specificity of 0.9986$\pm$0.0008. Additionally, a Pearson correlation coefficient of 0.9030$\pm$0.0179 and an R-square of 0.8586$\pm$0.0173 were observed in TMTV when measured on manual annotation compared to our segmentation results. Conclusion: This study highlights the advantages of MMCAF and vision transformer for lymphoma segmentation using PET and CT, offering great promise for computer-aided lymphoma diagnosis and treatment.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model
Authors:
Yuqi Chen,
Kan Ren,
Kaitao Song,
Yansen Wang,
Yifan Wang,
Dongsheng Li,
Lili Qiu
Abstract:
Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision. It is also applicable to brain signals such as electroencephalography (EEG) data, given the abundance of available unlabeled data that exist in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis. The existing works lev…
▽ More
Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision. It is also applicable to brain signals such as electroencephalography (EEG) data, given the abundance of available unlabeled data that exist in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis. The existing works leveraging self-supervised learning on EEG modeling mainly focus on pretraining upon each individual dataset corresponding to a single downstream task, which cannot leverage the power of abundant data, and they may derive sub-optimal solutions with a lack of generalization. Moreover, these methods rely on end-to-end model learning which is not easy for humans to understand. In this paper, we present a novel EEG foundation model, namely EEGFormer, pretrained on large-scale compound EEG data. The pretrained model cannot only learn universal representations on EEG signals with adaptable performance on various downstream tasks but also provide interpretable outcomes of the useful patterns within the data. To validate the effectiveness of our model, we extensively evaluate it on various downstream tasks and assess the performance under different transfer settings. Furthermore, we demonstrate how the learned model exhibits transferable anomaly detection performance and provides valuable interpretability of the acquired patterns via self-supervised learning.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Secure Cell-Free Integrated Sensing and Communication in the Presence of Information and Sensing Eavesdroppers
Authors:
Zixiang Ren,
Jie Xu,
Ling Qiu,
Derrick Wing Kwan Ng
Abstract:
This paper studies a secure cell-free integrated sensing and communication (ISAC) system, in which multiple ISAC transmitters collaboratively send confidential information to multiple communication users (CUs) and concurrently conduct target detection. Different from prior works investigating communication security against potential information eavesdrop**, we consider the security of both commu…
▽ More
This paper studies a secure cell-free integrated sensing and communication (ISAC) system, in which multiple ISAC transmitters collaboratively send confidential information to multiple communication users (CUs) and concurrently conduct target detection. Different from prior works investigating communication security against potential information eavesdrop**, we consider the security of both communication and sensing in the presence of both information and sensing eavesdroppers that aim to intercept confidential communication information and extract target information, respectively. Towards this end, we optimize the joint information and sensing transmit beamforming at these ISAC transmitters for secure cell-free ISAC. Our objective is to maximize the detection probability over a designated sensing area while ensuring the minimum signal-to-interference-plus-noise-ratio (SINR) requirements at CUs. Our formulation also takes into account the maximum tolerable signal-to-noise ratio (SNR) at information eavesdroppers for ensuring the confidentiality of information transmission, and the maximum detection probability constraints at sensing eavesdroppers for preserving sensing privacy. The formulated secure joint transmit beamforming problem is highly non-convex due to the intricate interplay between the detection probabilities, beamforming vectors, and SINR constraints. Fortunately, through strategic manipulation and via applying the semidefinite relaxation (SDR) technique, we successfully obtain the globally optimal solution to the design problem by rigorously verifying the tightness of SDR. Furthermore, we present two alternative joint beamforming designs based on the sensing SNR maximization over the specific sensing area and the coordinated beamforming, respectively. Numerical results reveal the benefits of our proposed design over these alternative benchmarks.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
A Cyclic Small Phase Theorem
Authors:
Chao Chen,
Wei Chen,
Di Zhao,
Jianqi Chen,
Li Qiu
Abstract:
This paper introduces a brand-new phase definition called the segmental phase for multi-input multi-output linear time-invariant systems. The underpinning of the definition lies in the matrix segmental phase which, as its name implies, is graphically based on the smallest circular segment covering the matrix normalized numerical range in the unit disk. The matrix segmental phase has the crucial pr…
▽ More
This paper introduces a brand-new phase definition called the segmental phase for multi-input multi-output linear time-invariant systems. The underpinning of the definition lies in the matrix segmental phase which, as its name implies, is graphically based on the smallest circular segment covering the matrix normalized numerical range in the unit disk. The matrix segmental phase has the crucial product eigen-phase bound, which makes itself stand out from several existing phase notions in the literature. The proposed bound paves the way for stability analysis of a cyclic feedback system consisting of multiple subsystems. A cyclic small phase theorem is then established as our main result, which requires the loop system phase to lie between $-π$ and $π$. The proposed theorem complements a cyclic version of the celebrated small gain theorem. In addition, a generalization of the proposed theorem is made via the use of angular scaling techniques for reducing conservatism.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Phase Preservation of N-Port Networks under General Connections
Authors:
Jianqi Chen,
Wei Chen,
Chao Chen,
Li Qiu
Abstract:
This study first introduces the frequency-wise phases of n-port linear time-invariant networks based on recently defined phases of complex matrices. Such a phase characterization can be used to quantify the well-known notion of passivity for networks. Further, a class of matrix operations induced by fairly common n-port network connections is examined. The intrinsic phase properties of networks un…
▽ More
This study first introduces the frequency-wise phases of n-port linear time-invariant networks based on recently defined phases of complex matrices. Such a phase characterization can be used to quantify the well-known notion of passivity for networks. Further, a class of matrix operations induced by fairly common n-port network connections is examined. The intrinsic phase properties of networks under such connections are preserved. Concretely, a scalable phase-preserving criterion is proposed, which involves only the phase properties of individual subnetworks, under the matrix operations featured by connections. This criterion ensures that the phase range of the integrated network can be verified effectively and that the scalability of the analyses can be maintained. In addition, the inverse operations of the considered connections, that is, network subtractions with correspondences are examined. With the known phase ranges of the integrated network and one of its subnetworks, the maximal allowable phase range of the remaining subnetwork can also be determined explicitly in a unified form for all types of subtractions. Finally, we extend the phase-preserving properties from the aforementioned connections to more general matrix operations defined using a certain indefinite inner product.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss
Authors:
Junbo Peng,
Chih-Wei Chang,
Huiqiao Xie,
Richard L. J. Qiu,
Justin Roper,
Tonghe Wang,
Beth Bradshaw,
Xiangyang Tang,
Xiaofeng Yang
Abstract:
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately…
▽ More
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings.
Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Sensing-Assisted Sparse Channel Recovery for Massive Antenna Systems
Authors:
Zixiang Ren,
Ling Qiu,
Jie Xu,
Derrick Wing Kwan Ng
Abstract:
This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Und…
▽ More
This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Under this setup, the BS first sends downlink pilots to the CU and concurrently receives the echo pilot signals for sensing the surrounding scatterers. Subsequently, the CU sends feedback information on its received pilot signal to the BS. Accordingly, the BS determines the sparse basis based on the sensed scatterers and proceeds to recover the wireless channel, exploiting the feedback information based on advanced compressive sensing (CS) algorithms. Numerical results show that the proposed sensing-assisted approach significantly increases the overall achievable rate than the conventional design relying on a discrete Fourier transform (DFT)-based sparse basis without sensing, thanks to the reduced training overhead and enhanced recovery accuracy with limited feedback.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Joint Angle and Delay Cramér-Rao Bound Optimization for Integrated Sensing and Communications
Authors:
Chao Hu,
Yuan Fang,
Ling Qiu
Abstract:
In this paper, we study a multi-input multi-output (MIMO) beamforming design in an integrated sensing and communication (ISAC) system, in which an ISAC base station (BS) is used to communicate with multiple downlink users and simultaneously the communication signals are reused for sensing multiple targets. Our interested sensing parameters are the angle and delay information of the targets, which…
▽ More
In this paper, we study a multi-input multi-output (MIMO) beamforming design in an integrated sensing and communication (ISAC) system, in which an ISAC base station (BS) is used to communicate with multiple downlink users and simultaneously the communication signals are reused for sensing multiple targets. Our interested sensing parameters are the angle and delay information of the targets, which can be used to locate these targets. Under this consideration, we first derive the Cramér-Rao bound (CRB) for angle and delay estimation. Then, we optimize the transmit beamforming at the BS to minimize the CRB, subject to communication rate and power constraints. In particular, we obtain the optimal solution in closed-form in the case of single-target and single-user, and in the case of multi-target and multi-user scenario, the sparsity of the optimal solution is proven, leading to a reduction in computational complexity during optimization. The numerical results demonstrate that the optimized beamforming yields excellent positioning performance and effectively reduces the requirement for a large number of antennas at the BS.
△ Less
Submitted 13 November, 2023; v1 submitted 9 November, 2023;
originally announced November 2023.
-
A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection
Authors:
Luyi Qiu,
Xiaofeng Zhang,
ChaoChen Gu,
and ShanYing Zhu
Abstract:
Remote sensing change detection between bi-temporal images receives growing concentration from researchers. However, comparing two bi-temporal images for detecting changes is challenging, as they demonstrate different appearances. In this paper, we propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks, which regards t…
▽ More
Remote sensing change detection between bi-temporal images receives growing concentration from researchers. However, comparing two bi-temporal images for detecting changes is challenging, as they demonstrate different appearances. In this paper, we propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks, which regards the detection model as a generator and attains the optimal weights of the detection model without increasing the parameters of the detection model through generative-adversarial strategy, boosting the spatial contiguity of predictions. Moreover, We design a multi-level feature extractor for effectively fusing multi-level features, which adopts the pre-trained model to extract multi-level features from bi-temporal images and introduces aggregate connections to fuse them. To strengthen the identification of multi-scale objects, we propose a multi-scale adaptive fusion module to adaptively fuse multi-scale features through various receptive fields and design a context refinement module to explore contextual dependencies. Moreover, the DAGAN framework utilizes the 4-layer convolution network as a discriminator to identify whether the synthetic image is fake or real. Extensive experiments represent that the DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Efficient Remote Sensing Segmentation With Generative Adversarial Transformer
Authors:
Luyi Qiu,
Dayu Yu,
Xiaofeng Zhang,
Chenxiao Zhang
Abstract:
Most deep learning methods that achieve high segmentation accuracy require deep network architectures that are too heavy and complex to run on embedded devices with limited storage and memory space. To address this issue, this paper proposes an efficient Generative Adversarial Transfomer (GATrans) for achieving high-precision semantic segmentation while maintaining an extremely efficient size. The…
▽ More
Most deep learning methods that achieve high segmentation accuracy require deep network architectures that are too heavy and complex to run on embedded devices with limited storage and memory space. To address this issue, this paper proposes an efficient Generative Adversarial Transfomer (GATrans) for achieving high-precision semantic segmentation while maintaining an extremely efficient size. The framework utilizes a Global Transformer Network (GTNet) as the generator, efficiently extracting multi-level features through residual connections. GTNet employs global transformer blocks with progressively linear computational complexity to reassign global features based on a learnable similarity function. To focus on object-level and pixel-level information, the GATrans optimizes the objective function by combining structural similarity losses. We validate the effectiveness of our approach through extensive experiments on the Vaihingen dataset, achieving an average F1 score of 90.17% and an overall accuracy of 91.92%.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
An Anchor-Point Based Image-Model for Room Impulse Response Simulation with Directional Source Radiation and Sensor Directivity Patterns
Authors:
Chao Pan,
Lei Zhang,
Yilong Lu,
Jilu **,
Lin Qiu,
**gdong Chen,
Jacob Benesty
Abstract:
The image model method has been widely used to simulate room impulse responses and the endeavor to adapt this method to different applications has also piqued great interest over the last few decades. This paper attempts to extend the image model method and develops an anchor-point-image-model (APIM) approach as a solution for simulating impulse responses by including both the source radiation and…
▽ More
The image model method has been widely used to simulate room impulse responses and the endeavor to adapt this method to different applications has also piqued great interest over the last few decades. This paper attempts to extend the image model method and develops an anchor-point-image-model (APIM) approach as a solution for simulating impulse responses by including both the source radiation and sensor directivity patterns. To determine the orientations of all the virtual sources, anchor points are introduced to real sources, which subsequently lead to the determination of the orientations of the virtual sources. An algorithm is developed to generate room impulse responses with APIM by taking into account the directional pattern functions, factional time delays, as well as the computational complexity. The developed model and algorithms can be used in various acoustic problems to simulate room acoustics and improve and evaluate processing algorithms.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Training-Free Energy Beamforming Assisted by Wireless Sensing
Authors:
Li Zhang,
Yuan Fang,
Zixiang Ren,
Ling Qiu,
Jie Xu
Abstract:
This paper studies the transmit energy beamforming in a multi-antenna wireless power transfer (WPT) system, in which an access point (AP) equipped with a uniform linear array (ULA) sends radio signals to wirelessly charge multiple single-antenna energy receivers (ERs). Different from conventional energy beamforming designs that require the AP to acquire the channel state information (CSI) via trai…
▽ More
This paper studies the transmit energy beamforming in a multi-antenna wireless power transfer (WPT) system, in which an access point (AP) equipped with a uniform linear array (ULA) sends radio signals to wirelessly charge multiple single-antenna energy receivers (ERs). Different from conventional energy beamforming designs that require the AP to acquire the channel state information (CSI) via training and feedback, we propose a new training-free energy beamforming approach assisted by wireless radar sensing, which is implemented based on the following two-stage protocol. In the first stage, the AP performs wireless radar sensing to estimate the path gain and angle parameters of the ERs for constructing the corresponding CSI. In the second stage, the AP implements the transmit energy beamforming based on the constructed CSI to efficiently charge these ERs in a fair manner. Under this setup, first, we jointly optimize the sensing beamformers and duration in the first stage to minimize the sensing duration, while ensuring a given accuracy threshold for parameters estimation subject to the maximum transmit power constraint at the AP. Next, we optimize the energy beamformers in the second stage to maximize the minimum harvested energy by all ERs. In this approach, the estimation accuracy threshold for the first stage is properly designed to balance the resource allocation between the two stages for optimizing the ultimate energy harvesting performance. Finally, numerical results show that the proposed training-free energy beamforming design performs close to the performance upper bound with perfect CSI, and outperforms the benchmark schemes without such joint optimization and that with isotropic transmission.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals
Authors:
Yu-Ting Lan,
Kan Ren,
Yansen Wang,
Wei-Long Zheng,
Dongsheng Li,
Bao-Liang Lu,
Lili Qiu
Abstract:
Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to vis…
▽ More
Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to visual stimuli reconstruction by reconstructing the observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notorious to be noisy, processing and extracting useful information requires more dedicated efforts; In this paper, we propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals. Specifically, we incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data. A latent diffusion model will then leverage the extracted information to reconstruct the high-resolution visual stimuli images. The experimental results have illustrated the effectiveness of image reconstruction and superior quantitative performance of our proposed method.
△ Less
Submitted 16 August, 2023; v1 submitted 27 July, 2023;
originally announced August 2023.
-
Fundamental CRB-Rate Tradeoff in Multi-Antenna ISAC Systems with Information Multicasting and Multi-Target Sensing
Authors:
Zixiang Ren,
Yunfei Peng,
Xianxin Song,
Yuan Fang,
Ling Qiu,
Liang Liu,
Derrick Wing Kwan Ng,
Jie Xu
Abstract:
This paper investigates the performance tradeoff for a multi-antenna integrated sensing and communication (ISAC) system with simultaneous information multicasting and multi-target sensing, in which a multi-antenna base station (BS) sends the common information messages to a set of single-antenna communication users (CUs) and estimates the parameters of multiple sensing targets based on the echo si…
▽ More
This paper investigates the performance tradeoff for a multi-antenna integrated sensing and communication (ISAC) system with simultaneous information multicasting and multi-target sensing, in which a multi-antenna base station (BS) sends the common information messages to a set of single-antenna communication users (CUs) and estimates the parameters of multiple sensing targets based on the echo signals concurrently. We consider two target sensing scenarios without and with prior target knowledge at the BS, in which the BS is interested in estimating the complete multi-target response matrix and the target reflection coefficients/angles, respectively. First, we consider the capacity-achieving transmission and characterize the fundamental tradeoff between the achievable rate and the multi-target estimation Cramér-Rao bound (CRB) accordingly.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling
Authors:
Ziyue Li,
Yuchen Fang,
You Li,
Kan Ren,
Yansen Wang,
Xufang Luo,
Juanyong Duan,
Congrui Huang,
Dongsheng Li,
Lili Qiu
Abstract:
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to…
▽ More
A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Authors:
Yukang Liang,
Kaitao Song,
Shaoguang Mao,
Huiqiang Jiang,
Luna Qiu,
Yuqing Yang,
Dongsheng Li,
Linli Xu,
Lili Qiu
Abstract:
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propo…
▽ More
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propose a simple yet effective method, namely \underline{M}asked pre-training for \underline{P}ronunciation \underline{A}ssessment (MPA). Specifically, by incorporating a mask-predict strategy, our MPA supports end-to-end training without leveraging any aligning components and can solve misalignment issues to a large extent during prediction. Furthermore, we design two evaluation strategies to enable our model to conduct assessments in both unsupervised and supervised settings. Experimental results on SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. In spite of this, MPA still has some limitations, such as requiring more inference time and reference text. They expect to be addressed in future work.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model
Authors:
Shaoyan Pan,
Elham Abouei,
Jacob Wynne,
Tonghe Wang,
Richard L. J. Qiu,
Yuheng Li,
Chih-Wei Chang,
Junbo Peng,
Justin Roper,
Pretesh Patel,
David S. Yu,
Hui Mao,
Xiaofeng Yang
Abstract:
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to…
▽ More
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to facilitate radiation treatment planning. MC-DDPM implements diffusion processes with a shifted-window transformer network to generate sCT from MRI. The proposed model consists of two processes: a forward process which adds Gaussian noise to real CT scans, and a reverse process in which a shifted-window transformer V-net (Swin-Vnet) denoises the noisy CT scans conditioned on the MRI from the same patient to produce noise-free CT scans. With an optimally trained Swin-Vnet, the reverse diffusion process was used to generate sCT scans matching MRI anatomy. We evaluated the proposed method by generating sCT from MRI on a brain dataset and a prostate dataset. Qualitative evaluation was performed using the mean absolute error (MAE) of Hounsfield unit (HU), peak signal to noise ratio (PSNR), multi-scale Structure Similarity index (MS-SSIM) and normalized cross correlation (NCC) indexes between ground truth CTs and sCTs. MC-DDPM generated brain sCTs with state-of-the-art quantitative results with MAE 43.317 HU, PSNR 27.046 dB, SSIM 0.965, and NCC 0.983. For the prostate dataset, MC-DDPM achieved MAE 59.953 HU, PSNR 26.920 dB, SSIM 0.849, and NCC 0.948. In conclusion, we have developed and validated a novel approach for generating CT images from routine MRIs using a transformer-based DDPM. This model effectively captures the complex relationship between CT and MRI images, allowing for robust and high-quality synthetic CT (sCT) images to be generated in minutes.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Authors:
Yuheng Li,
Jacob Wynne,
**g Wang,
Richard L. J. Qiu,
Justin Roper,
Shaoyan Pan,
Ashesh B. Jani,
Tian Liu,
Pretesh R. Patel,
Hui Mao,
Xiaofeng Yang
Abstract:
Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learni…
▽ More
Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learning (SSL) utilizes unlabeled data to generate meaningful semantic representations without the need for costly annotations, enhancing model performance on tasks with limited labeled data. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI) and demonstrate the effectiveness of our proposed self-supervised pre-training framework. Using a large prostate bpMRI dataset with 1500 patients, we first pretrain CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four comparable models (Swin UNETR, DynUNet, Attention UNet, UNet). Using a separate bpMRI dataset with 158 patients, we evaluate our method robustness to external hold-out data. Self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data.
△ Less
Submitted 17 March, 2024; v1 submitted 30 April, 2023;
originally announced May 2023.
-
Cycle-guided Denoising Diffusion Probability Model for 3D Cross-modality MRI Synthesis
Authors:
Shaoyan Pan,
Chih-Wei Chang,
Junbo Peng,
Jiahan Zhang,
Richard L. J. Qiu,
Tonghe Wang,
Justin Roper,
Tian Liu,
Hui Mao,
Xiaofeng Yang
Abstract:
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two mod…
▽ More
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two modalities. This improves image-to-image translation ac-curacy. We evaluated the CG-DDPM quantitatively using mean absolute error (MAE), multi-scale structural similarity index measure (MSSIM), and peak sig-nal-to-noise ratio (PSNR), as well as the network synthesis consistency, on the BraTS2020 dataset. Our proposed method showed high accuracy and reliable consistency for MRI synthesis. In addition, we compared the CG-DDPM with several other state-of-the-art networks and demonstrated statistically significant improvements in the image quality of synthetic MRIs. The proposed method enhances the capability of current multimodal MRI synthesis approaches, which could contribute to more accurate diagnosis and better treatment planning for patients by synthesizing additional MRI modalities.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Rapid online solution of inverse heat transfer problem by ANN-based extended Kalman smoothing algorithm
Authors:
Xinxin Zhang,
Dike Li,
Jianqin Zhu,
Zhi Tao,
Lu Qiu
Abstract:
Digital twin is a modern technology for many advanced applications. To construct a digital twin of a thermal system, it is required to make online estimations of unknown time-varying boundary conditions from sensor measured data, which needs to solve inverse heat transfer problems (IHTPs). However, a fast and accurate solution is challenging since the measured data is normally contaminated with no…
▽ More
Digital twin is a modern technology for many advanced applications. To construct a digital twin of a thermal system, it is required to make online estimations of unknown time-varying boundary conditions from sensor measured data, which needs to solve inverse heat transfer problems (IHTPs). However, a fast and accurate solution is challenging since the measured data is normally contaminated with noise and the traditional method to solve IHTP involves significant amount of calculations. Therefore, in this work, a rapid yet robust inversion algorithm called ANN-based extended Kalman smoothing algorithm is developed to realize the online prediction of desired parameter based on the measurements. The fast prediction is realized by replacing the conventional CFD-based state transfer models in extended Kalman smoothing algorithm with pre-trained ANN. Then, a two-dimensional internal convective heat transfer problem was employed as the case study to test the algorithm. The results have proved that the proposed algorithm is a computational-light and robust approach for solving IHTPs. The proposed algorithm can achieve estimation of unknown boundary conditions with a dimensionless average error of 0.0580 under noisy temperature measurement with a standard deviation of 10 K with a drastic reduction of computational cost compared to the conventional approach. Moreover, the effects of training data, location of sensor, future time step selection on the performance of prediction are investigated.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Authors:
Guanghao Yin,
Zefan Qu,
Xinyang Jiang,
Shan Jiang,
Zhenhua Han,
Ningxin Zheng,
Xiaohong Liu,
Huan Yang,
Yuqing Yang,
Dongsheng Li,
Lili Qiu
Abstract:
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to…
▽ More
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to diverse and dynamic degradations. Furthermore, online streaming has a strict requirement for latency that most existing methods are less applicable. As a result, this paper focuses on the rarely exploited problem setting of online streaming video super resolution. To facilitate the research on this problem, a new benchmark dataset named LDV-WebRTC is constructed based on a real-world online streaming system. Leveraging the new benchmark dataset, we proposed a novel method specifically for online video streaming, which contains a convolution and Look-Up Table (LUT) hybrid model to achieve better performance-latency trade-off. To tackle the changing degradations, we propose a mixture-of-expert-LUT module, where a set of LUT specialized in different degradations are built and adaptively combined to handle different degradations. Experiments show our method achieves 720P video SR around 100 FPS, while significantly outperforms existing LUT-based methods and offers competitive performance compared to efficient CNN-based methods.
△ Less
Submitted 25 July, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival Prediction
Authors:
Lin Qiu,
Aminollah Khormali,
Kai Liu
Abstract:
The integration of multi-modal data, such as pathological images and genomic data, is essential for understanding cancer heterogeneity and complexity for personalized treatments, as well as for enhancing survival predictions. Despite the progress made in integrating pathology and genomic data, most existing methods cannot mine the complex inter-modality relations thoroughly. Additionally, identify…
▽ More
The integration of multi-modal data, such as pathological images and genomic data, is essential for understanding cancer heterogeneity and complexity for personalized treatments, as well as for enhancing survival predictions. Despite the progress made in integrating pathology and genomic data, most existing methods cannot mine the complex inter-modality relations thoroughly. Additionally, identifying explainable features from these models that govern preclinical discovery and clinical prediction is crucial for cancer diagnosis, prognosis, and therapeutic response studies. We propose PONET- a novel biological pathway-informed pathology-genomic deep model that integrates pathological images and genomic data not only to improve survival prediction but also to identify genes and pathways that cause different survival rates in patients. Empirical results on six of The Cancer Genome Atlas (TCGA) datasets show that our proposed method achieves superior predictive performance and reveals meaningful biological interpretations. The proposed method establishes insight into how to train biologically informed deep networks on multimodal biomedical data which will have general applicability for understanding diseases and predicting response and resistance to treatment.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Attentive Mask CLIP
Authors:
Yifan Yang,
Weiquan Huang,
Yixuan Wei,
Houwen Peng,
Xinyang Jiang,
Huiqiang Jiang,
Fangyun Wei,
Yin Wang,
Han Hu,
Lili Qiu,
Yuqing Yang
Abstract:
Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incor…
▽ More
Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incorrect pairing target in CLIP training. To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description. The correlation scores are computed in an online fashion using the EMA version of the visual encoder. Our experiments show that the proposed attentive masking approach performs better than the previous method of random token removal for CLIP training. The approach also makes it efficient to apply multiple augmentation views to the image, as well as introducing instance contrastive learning tasks between these views into the CLIP framework. Compared to other CLIP improvements that combine different pre-training targets such as SLIP and MaskCLIP, our method is not only more effective, but also much more efficient. Specifically, using ViT-B and YFCC-15M dataset, our approach achieves $43.9\%$ top-1 accuracy on ImageNet-1K zero-shot classification, as well as $62.7/42.1$ and $38.0/23.2$ I2T/T2I retrieval accuracy on Flickr30K and MS COCO, which are $+1.1\%$, $+5.5/+0.9$, and $+4.4/+1.3$ higher than the SLIP method, while being $2.30\times$ faster. An efficient version of our approach running $1.16\times$ faster than the plain CLIP model achieves significant gains of $+5.3\%$, $+11.3/+8.0$, and $+9.5/+4.9$ on these benchmarks.
△ Less
Submitted 9 October, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Synchronization of Diverse Agents via Phase Analysis
Authors:
Dan Wang,
Wei Chen,
Li Qiu
Abstract:
In this paper, the synchronization of heterogeneous agents interacting over a dynamical network is studied. The edge dynamics can model the inter-agent communications which are often heterogeneous by nature. They can also model the controllers of the agents which may be different for each agent or uniform for all the agents. Novel synchronization conditions are obtained for both cases from a phase…
▽ More
In this paper, the synchronization of heterogeneous agents interacting over a dynamical network is studied. The edge dynamics can model the inter-agent communications which are often heterogeneous by nature. They can also model the controllers of the agents which may be different for each agent or uniform for all the agents. Novel synchronization conditions are obtained for both cases from a phase perspective by exploiting a recently developed small phase theorem. The conditions scale well with the network and reveal the trade-off between the phases of node dynamics and edge dynamics. We also study the synchronizability problem which aims to characterize the allowable diversity of the agents for which controllers can be designed so as to achieve synchronization. The allowable diversity is captured in terms of phase conditions engaging the residue matrices of the agents at their persistent modes. Controller design algorithms are provided for the cases of agent-dependent and uniform controllers, respectively.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Non-line-of-sight imaging with arbitrary illumination and detection pattern
Authors:
Xintong Liu,
Jianyu Wang,
Zuoqiang Shi,
Xing Fu,
Lingyun Qiu
Abstract:
Non-line-of-sight (NLOS) imaging aims at reconstructing targets obscured from the direct line of sight. Existing NLOS imaging algorithms require dense measurements at rectangular grid points in a large area of the relay surface, which severely hinders their availability to variable relay scenarios in practical applications such as robotic vision, autonomous driving, rescue operations and remote se…
▽ More
Non-line-of-sight (NLOS) imaging aims at reconstructing targets obscured from the direct line of sight. Existing NLOS imaging algorithms require dense measurements at rectangular grid points in a large area of the relay surface, which severely hinders their availability to variable relay scenarios in practical applications such as robotic vision, autonomous driving, rescue operations and remote sensing. In this work, we propose a Bayesian framework for NLOS imaging with no specific requirements on the spatial pattern of illumination and detection points. By introducing virtual confocal signals, we design a confocal complemented signal-object collaborative regularization (CC-SOCR) algorithm for high quality reconstructions. Our approach is capable of reconstructing both albedo and surface normal of the hidden objects with fine details under the most general relay setting. Moreover, with a regular relay surface, coarse rather than dense measurements are enough for our approach such that the acquisition time can be reduced significantly. As demonstrated in multiple experiments, the new framework substantially enhances the applicability of NLOS imaging.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Optimal $(0,1)$-Matrix Completion with Majorization Ordered Objectives (To the memory of Pravin Varaiya)
Authors:
Yanfang Mo,
Wei Chen,
Keyou You,
Li Qiu
Abstract:
We propose and examine two optimal $(0,1)$-matrix completion problems with majorization ordered objectives. They elevate the seminal study by Gale and Ryser from feasibility to optimality in partial order programming (POP), referring to optimization with partially ordered objectives. We showcase their applications in electric vehicle charging, portfolio optimization, and secure data storage. Solvi…
▽ More
We propose and examine two optimal $(0,1)$-matrix completion problems with majorization ordered objectives. They elevate the seminal study by Gale and Ryser from feasibility to optimality in partial order programming (POP), referring to optimization with partially ordered objectives. We showcase their applications in electric vehicle charging, portfolio optimization, and secure data storage. Solving such integer POP (iPOP) problems is challenging because of the possible non-comparability among objective values and the integer requirements. Nevertheless, we prove the essential uniqueness of all optimal objective values and identify two particular ones for each of the two inherently symmetric iPOP problems. Furthermore, for every optimal objective value, we decompose the construction of an associated optimal~$(0,1)$-matrix into a series of sorting processes, respectively agreeing with the rule of thumb "peak shaving" or "valley filling." We show that the resulting algorithms have linear time complexities and verify their empirical efficiency via numerical simulations compared to the standard order-preserving method for POP.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Authors:
Kaitao Song,
Teng Wan,
Bixia Wang,
Huiqiang Jiang,
Luna Qiu,
Jiahang Xu,
Li** Jiang,
Qun Lou,
Yuqing Yang,
Dongsheng Li,
Xudong Wang,
Lili Qiu
Abstract:
Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-lang…
▽ More
Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-language pathologists to make precise diagnoses. Existing methods for hypernasality estimation only conduct acoustic analysis based on low-resource cleft palate dataset, by using statistical or neural network-based features. In this paper, we propose a novel approach that uses automatic speech recognition model to improve hypernasality estimation. Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation. Benefiting from such design, our model for hypernasality estimation can enjoy the advantages of ASR model: 1) compared with low-resource cleft palate dataset, the ASR task usually includes large-scale speech data in the general domain, which enables better model generalization; 2) the text annotations in ASR dataset guide model to extract better acoustic features. Experimental results on two cleft palate datasets demonstrate that our method achieves superior performance compared with previous approaches.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Robust Transmit Beamforming for Secure Integrated Sensing and Communication
Authors:
Zixiang Ren,
Ling Qiu,
Jie Xu,
Derrick Wing Kwan Ng
Abstract:
This paper studies a downlink secure integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) transmits confidential messages to a single-antenna communication user (CU) while performing sensing on targets that may act as suspicious eavesdroppers. To ensure the quality of target sensing while preventing their potential eavesdrop**, the BS combines the transm…
▽ More
This paper studies a downlink secure integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) transmits confidential messages to a single-antenna communication user (CU) while performing sensing on targets that may act as suspicious eavesdroppers. To ensure the quality of target sensing while preventing their potential eavesdrop**, the BS combines the transmit confidential information signals with additional dedicated sensing signals, which play a dual role of artificial noise (AN) for degrading the qualities of eavesdrop** channels. Under this setup, we jointly design the transmit information and sensing beamforming, with the objective of minimizing the weighted sum of beampattern matching errors and cross-correlation patterns for sensing subject to secure communication constraints. The robust design takes into account the channel state information (CSI) imperfectness of the eavesdroppers in two practical CSI error scenarios. First, we consider the scenario with bounded CSI errors of eavesdroppers, in which the worst-case secrecy rate constraint is adopted to ensure secure communication performance. In this scenario, we present the optimal solution to the worst-case secrecy rate constrained sensing beampattern optimization problem, by adopting the techniques of S-procedure, semi-definite relaxation (SDR), and a one-dimensional (1D) search, for which the tightness of the SDR is rigorously proved. Next, we consider the scenario with Gaussian CSI errors of eavesdroppers, in which the secrecy outage probability constraint is adopted. In this scenario, we present an efficient algorithm to solve the more challenging secrecy outage-constrained sensing beampattern optimization problem, by exploiting the convex restriction technique based on the Bernstein-type inequality, together with the SDR and 1D search.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Spatial Aware Multi-Task Learning Based Speech Separation
Authors:
Wei Sun,
Mei Wang,
Lili Qiu
Abstract:
During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not only degrades the voice quality but also raises serious privacy issues. In this paper, we develop a novel system, called Spatial Aware Multi-task learning-based S…
▽ More
During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not only degrades the voice quality but also raises serious privacy issues. In this paper, we develop a novel system, called Spatial Aware Multi-task learning-based Separation (SAMS), to extract audio signals from the target user during teleconferencing. Our solution consists of three novel components: (i) generating fine-grained location embeddings from the user's voice and inaudible tracking sound, which contains the user's position and rich multipath information, (ii) develo** a source separation neural network using multi-task learning to jointly optimize source separation and location, and (iii) significantly speeding up inference to provide a real-time guarantee. Our testbed experiments demonstrate the effectiveness of our approach
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources
Authors:
Haoran Yin,
Meng Ge,
Yanjie Fu,
Gaoyan Zhang,
Longbiao Wang,
Lei Zhang,
Lin Qiu,
Jianwu Dang
Abstract:
Recent neural network based Direction of Arrival (DoA) estimation algorithms have performed well on unknown number of sound sources scenarios. These algorithms are usually achieved by map** the multi-channel audio input to the single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO. However, such MISO algorithms strongly depend on empirical threshold settin…
▽ More
Recent neural network based Direction of Arrival (DoA) estimation algorithms have performed well on unknown number of sound sources scenarios. These algorithms are usually achieved by map** the multi-channel audio input to the single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO. However, such MISO algorithms strongly depend on empirical threshold setting and the angle assumption that the angles between the sound sources are greater than a fixed angle. To address these limitations, we propose a novel multi-channel input and multiple outputs DoA network called MIMO-DoAnet. Unlike the general MISO algorithms, MIMO-DoAnet predicts the SPS coding of each sound source with the help of the informative spatial covariance matrix. By doing so, the threshold task of detecting the number of sound sources becomes an easier task of detecting whether there is a sound source in each output, and the serious interaction between sound sources disappears during inference stage. Experimental results show that MIMO-DoAnet achieves relative 18.6% and absolute 13.3%, relative 34.4% and absolute 20.2% F1 score improvement compared with the MISO baseline system in 3, 4 sources scenes. The results also demonstrate MIMO-DoAnet alleviates the threshold setting problem and solves the angle assumption problem effectively.
△ Less
Submitted 16 November, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Visual-Assisted Sound Source Depth Estimation in the Wild
Authors:
Wei Sun,
Lili Qiu
Abstract:
Depth estimation enables a wide variety of 3D applications, such as robotics, autonomous driving, and virtual reality. Despite significant work in this area, it remains open how to enable accurate, low-cost, high-resolution, and large-range depth estimation. Inspired by the flash-to-bang phenomenon (i.e. hearing the thunder after seeing the lightning), this paper develops FBDepth, the first audio-…
▽ More
Depth estimation enables a wide variety of 3D applications, such as robotics, autonomous driving, and virtual reality. Despite significant work in this area, it remains open how to enable accurate, low-cost, high-resolution, and large-range depth estimation. Inspired by the flash-to-bang phenomenon (i.e. hearing the thunder after seeing the lightning), this paper develops FBDepth, the first audio-visual depth estimation framework. It takes the difference between the time-of-flight (ToF) of the light and the sound to infer the sound source depth. FBDepth is the first to incorporate video and audio with both semantic features and spatial hints for range estimation. It first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objects' trajectories, FBDepth proposes to estimate the intersection of optical flow before and after the sound production to locate video events in time. FBDepth feeds the estimated timestamp of the video event and the audio clip for the final depth estimation. We use a mobile phone to collect 3000+ video clips with 20 different objects at up to $50m$. FBDepth decreases the Absolute Relative error (AbsRel) by 55\% compared to RGB-based methods.
△ Less
Submitted 20 July, 2022; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Fundamental CRB-Rate Tradeoff in Multi-antenna Multicast Channel with ISAC
Authors:
Zixiang Ren,
Xianxin Song,
Yuan Fang,
Ling Qiu,
Jie Xu
Abstract:
This paper studies the multi-antenna multicast channel with integrated sensing and communication (ISAC), in which a multi-antenna base station (BS) sends common messages to a set of single-antenna communication users (CUs) and simultaneously estimates the parameters of an extended target via radar sensing. We investigate the fundamental performance limits of this ISAC system, in terms of the achie…
▽ More
This paper studies the multi-antenna multicast channel with integrated sensing and communication (ISAC), in which a multi-antenna base station (BS) sends common messages to a set of single-antenna communication users (CUs) and simultaneously estimates the parameters of an extended target via radar sensing. We investigate the fundamental performance limits of this ISAC system, in terms of the achievable rate for communication and the estimation Cramér-Rao bound (CRB) for sensing. First, we derive the optimal transmit covariance in semi-closed form to balance the CRB-rate (C-R) tradeoff, and accordingly characterize the outer bound of a so-called C-R region. It is shown that the optimal transmit covariance should be of full rank, consisting of both information-carrying and dedicated sensing signals in general. Next, we consider a practical joint information and sensing beamforming design, and propose an efficient approach to optimize the joint beamforming for balancing the C-R tradeoff. Numerical results are presented to show the C-R region achieved by the optimal transmit covariance and the joint beamforming, as compared to other benchmark schemes.
△ Less
Submitted 7 August, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Gain and phase type multipliers for structured feedback robustness
Authors:
Axel Ringh,
Xin Mao,
Wei Chen,
Li Qiu,
Sei Zhen Khong
Abstract:
It is known that the stability of a feedback interconnection of two linear time-invariant systems implies that the graphs of the open-loop systems are quadratically separated. This separation is defined by an object known as the multiplier. The theory of integral quadratic constraints shows that the converse also holds under certain conditions. This paper establishes that if the feedback is robust…
▽ More
It is known that the stability of a feedback interconnection of two linear time-invariant systems implies that the graphs of the open-loop systems are quadratically separated. This separation is defined by an object known as the multiplier. The theory of integral quadratic constraints shows that the converse also holds under certain conditions. This paper establishes that if the feedback is robustly stable against certain structured uncertainty, then there always exists a multiplier that takes a corresponding form. In particular, if the feedback is robustly stable to certain gain-type uncertainty, then there exists a corresponding multiplier that is of phase-type, i.e., its diagonal blocks are zeros. These results build on the notion of phases of matrices and systems, which was recently introduced in the field of control. Similarly, if the feedback is robustly stable to certain phase-type uncertainty, then there exists a gain-type multiplier, i.e., its off-diagonal blocks are zeros. The results are meaningfully instructive in the search for a valid multiplier for establishing robust closed-loop stability, and cover the well-known small-gain and the recent small-phase theorems.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
When Small Gain Meets Small Phase
Authors:
Di Zhao,
Wei Chen,
Li Qiu
Abstract:
In this paper, we investigate the feedback stability of multiple-input multiple-output linear time-invariant systems with combined gain and phase information. To begin with, we explore the stability condition for a class of so-called easily controllable systems, which have small phase at low frequency ranges and low gain at high frequency. Next, we extend the stability condition via frequency-wise…
▽ More
In this paper, we investigate the feedback stability of multiple-input multiple-output linear time-invariant systems with combined gain and phase information. To begin with, we explore the stability condition for a class of so-called easily controllable systems, which have small phase at low frequency ranges and low gain at high frequency. Next, we extend the stability condition via frequency-wise gain and phase combination, based on which a mixed small gain and phase condition with necessity, called a small vase theorem, is then obtained. Furthermore, the fusion of gain and phase information is investigated by a geometric approach based on the Davis-Wielandt shell. Finally, for the purpose of efficient computation and controller synthesis, we present a bounded & sectored real lemma, which gives state-space characterization of combined gain and phase properties based on a triple of linear matrix inequalities.
△ Less
Submitted 21 February, 2022; v1 submitted 16 January, 2022;
originally announced January 2022.
-
Optimal Transmit Beamforming for Secrecy Integrated Sensing and Communication
Authors:
Zixiang Ren,
Ling Qiu,
Jie Xu
Abstract:
This paper studies a secrecy integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) aims to send confidential messages to a single-antenna communication user (CU), and at the same time sense several targets that may be suspicious eavesdroppers. To ensure the sensing quality while preventing the eavesdrop**, we consider that the BS sends dedicated sensing s…
▽ More
This paper studies a secrecy integrated sensing and communication (ISAC) system, in which a multi-antenna base station (BS) aims to send confidential messages to a single-antenna communication user (CU), and at the same time sense several targets that may be suspicious eavesdroppers. To ensure the sensing quality while preventing the eavesdrop**, we consider that the BS sends dedicated sensing signals (in addition to confidential information signals) that play a dual role of artificial noise (AN) for confusing the eavesdrop** targets. Under this setup, we jointly optimize the transmit information and sensing beamforming at the BS, to minimize the matching error between the transmit beampattern and a desired beampattern for sensing, subject to the minimum secrecy rate requirement at the CU and the transmit power constraint at the BS. Although the formulated problem is non-convex, we propose an algorithm to obtain the globally optimal solution by using the semidefinite relaxation (SDR) together with a one-dimensional (1D) search. Next, to avoid the high complexity induced by the 1D search, we also present two sub-optimal solutions based on zero-forcing and separate beamforming designs, respectively. Numerical results show that the proposed designs properly adjust the information and sensing beams to balance the tradeoffs among communicating with CU, sensing targets, and confusing eavesdroppers, thus achieving desirable transmit beampattern for sensing while ensuring the CU's secrecy rate.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Parallel Feedforward Compensation for Output Synchronization: Fully Distributed Control and Indefinite Laplacian
Authors:
Mengmou Li,
Ioannis Lestas,
Li Qiu
Abstract:
This work is associated with the use of parallel feedforward compensators (PFCs) for the problem of output synchronization over heterogeneous agents and the benefits this approach can provide. Specifically, it addresses the addition of stable PFCs on agents that interact with each other using diffusive couplings. The value in the application of such PFC is twofold. Firstly, it has been an issue th…
▽ More
This work is associated with the use of parallel feedforward compensators (PFCs) for the problem of output synchronization over heterogeneous agents and the benefits this approach can provide. Specifically, it addresses the addition of stable PFCs on agents that interact with each other using diffusive couplings. The value in the application of such PFC is twofold. Firstly, it has been an issue that output synchronization among passivity-short systems requires global information for the design of controllers in the cases when initial conditions need to be taken into account, such as average consensus and distributed optimization. We show that a stable PFC can be designed to passivate a passivity-short system while its output asymptotically vanishes as its input tends to zero. As a result, output synchronization is achieved among these systems by fully distributed controls without altering the original consensus results. Secondly, in the literature of output synchronization over signed weighted graphs, it is generally required that the graph Laplacian be positive semidefinite, i.e., $L \geq 0$ for undirected graphs or $L + L^T \geq 0$ for balanced directed graphs. We show that the PFC serves as output feedback to the communication graph to enhance the robustness against negative weight edges. As a result, output synchronization is achieved over a signed weighted and balanced graph, even if the corresponding Laplacian is not positive semidefinite.
△ Less
Submitted 25 April, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
The Singular Angle of Nonlinear Systems
Authors:
Chao Chen,
Wei Chen,
Di Zhao,
Sei Zhen Khong,
Li Qiu
Abstract:
In this paper, we introduce an angle notion, called the singular angle, for stable nonlinear systems from an input-output perspective. The proposed system singular angle, based on the angle between $\mathcal{L}_2$-signals, describes an upper bound for the "rotating effect" from the system input to output signals. It is, thus, different from the recently appeared nonlinear system phase which adopts…
▽ More
In this paper, we introduce an angle notion, called the singular angle, for stable nonlinear systems from an input-output perspective. The proposed system singular angle, based on the angle between $\mathcal{L}_2$-signals, describes an upper bound for the "rotating effect" from the system input to output signals. It is, thus, different from the recently appeared nonlinear system phase which adopts the complexification of real-valued signals using the Hilbert transform. It can quantify the passivity and serve as an angular counterpart to the system $\mathcal{L}_2$-gain. It also provides an alternative to the nonlinear system phase. A nonlinear small angle theorem, which involves a comparison of the loop system angle with $π$, is established for feedback stability analysis. When dealing with multi-input multi-output linear time-invariant (LTI) systems, we further come up with the frequency-wise and $\mathcal{H}_\infty$ singular angle notions based on the matrix singular angle, and develop corresponding LTI small angle theorems.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
A Phase Theory of MIMO LTI Systems
Authors:
Wei Chen,
Dan Wang,
Sei Zhen Khong,
Li Qiu
Abstract:
In this paper, we define the phase response for a class of multi-input multi-output (MIMO) linear time-invariant (LTI) systems whose frequency responses are (semi-)sectorial at all frequencies. The newly defined phase concept subsumes the well-known notions of positive real systems and negative imaginary systems. We formulate a small phase theorem for feedback stability, which complements the cele…
▽ More
In this paper, we define the phase response for a class of multi-input multi-output (MIMO) linear time-invariant (LTI) systems whose frequency responses are (semi-)sectorial at all frequencies. The newly defined phase concept subsumes the well-known notions of positive real systems and negative imaginary systems. We formulate a small phase theorem for feedback stability, which complements the celebrated small gain theorem. The small phase theorem lays the foundation of a phase theory of MIMO systems. We also discuss time-domain interpretations of phase-bounded systems via both energy signal analysis and power signal analysis. In addition, a sectored real lemma is derived for the computation of MIMO phases, which serves as a natural counterpart of the bounded real lemma.
△ Less
Submitted 21 October, 2022; v1 submitted 8 May, 2021;
originally announced May 2021.
-
Generative Adversarial Network for Image Synthesis
Authors:
Yang Lei,
Richard L. J. Qiu,
Tonghe Wang,
Walter J. Curran,
Tian Liu,
Xiaofeng Yang
Abstract:
This chapter reviews recent developments of generative adversarial networks (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into conditional GAN and Cycle-GAN according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, highlights important contributio…
▽ More
This chapter reviews recent developments of generative adversarial networks (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into conditional GAN and Cycle-GAN according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, highlights important contributions and identifies specific challenges.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Phase of Nonlinear Systems
Authors:
Chao Chen,
Di Zhao,
Wei Chen,
Sei Zhen Khong,
Li Qiu
Abstract:
In this paper, we propose a definition of phase for a class of stable nonlinear systems called semi-sectorial systems, from an input-output perspective. The definition involves the Hilbert transform as a critical instrument to complexify real-valued signals since the notion of phase arises most naturally in the complex domain. The proposed nonlinear system phase, serving as a counterpart of…
▽ More
In this paper, we propose a definition of phase for a class of stable nonlinear systems called semi-sectorial systems, from an input-output perspective. The definition involves the Hilbert transform as a critical instrument to complexify real-valued signals since the notion of phase arises most naturally in the complex domain. The proposed nonlinear system phase, serving as a counterpart of $\mathcal{L}_2$-gain, quantifies the passivity and is highly related to the dissipativity. It also possesses a nice physical interpretation which quantifies the tradeoff between the real energy and reactive energy. A nonlinear small phase theorem is then established for feedback stability analysis of semi-sectorial systems. Additionally, its generalized version is proposed via the use of multipliers. These nonlinear small phase theorems generalize a version of the classical passivity theorem and a recently appeared linear time-invariant small phase theorem.
△ Less
Submitted 2 May, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Low Phase-Rank Approximation
Authors:
Di Zhao,
Axel Ringh,
Li Qiu,
Sei Zhen Khong
Abstract:
In this paper, we propose and solve a low phase-rank approximation problem, which serves as a counterpart to the well-known low-rank approximation problem and the Schmidt-Mirsky theorem. More specifically, a nonzero complex number can be specified by its gain and phase, and while it is generally accepted that the gains of a matrix may be defined by its singular values, there is no widely accepted…
▽ More
In this paper, we propose and solve a low phase-rank approximation problem, which serves as a counterpart to the well-known low-rank approximation problem and the Schmidt-Mirsky theorem. More specifically, a nonzero complex number can be specified by its gain and phase, and while it is generally accepted that the gains of a matrix may be defined by its singular values, there is no widely accepted definition for its phases. In this work, we consider sectorial matrices, whose numerical ranges do not contain the origin, and adopt the canonical angles of such matrices as their phases. Similarly to the rank of a matrix defined to be the number of its nonzero singular values, we define the phase-rank of a sectorial matrix as the number of its nonzero phases. While a low-rank approximation problem is associated with matrix arithmetic means, as a natural parallel we formulate a low phase-rank approximation problem using matrix geometric means to measure the approximation error. A characterization of the solutions to the proposed problem is then obtained, when both the objective matrix and the approximant are restricted to be positive-imaginary. Moreover, the obtained solution has the same flavor as the Schmidt-Mirsky theorem on low-rank approximation problems. In addition, we provide an alternative formulation of the low phase-rank approximation problem using geodesic distances between sectorial matrices. The two formulations give rise to the exact same set of solutions when the involved matrices are additionally assumed to be unitary.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Vertical-Horizontal Structured Attention for Generating Music with Chords
Authors:
Yizhou Zhao,
Liang Qiu,
Wensi Ai,
Feng Shi,
Song-Chun Zhu
Abstract:
In this paper, we propose a lightweight music-generating model based on variational autoencoder (VAE) with structured attention. Generating music is different from generating text because the melodies with chords give listeners distinguished polyphonic feelings. In a piece of music, a chord consisting of multiple notes comes from either the mixture of multiple instruments or the combination of mul…
▽ More
In this paper, we propose a lightweight music-generating model based on variational autoencoder (VAE) with structured attention. Generating music is different from generating text because the melodies with chords give listeners distinguished polyphonic feelings. In a piece of music, a chord consisting of multiple notes comes from either the mixture of multiple instruments or the combination of multiple keys of a single instrument. We focus our study on the latter. Our model captures not only the temporal relations along time but the structure relations between keys. Experimental results show that our model has a better performance than baseline MusicVAE in capturing notes in a chord. Besides, our method accords with music theory since it maintains the configuration of the circle of fifths, distinguishes major and minor keys from interval vectors, and manifests meaningful structures between music phrases.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity
Authors:
Wei Chen,
Dan Wang,
Ji Liu,
Yongxin Chen,
Sei Zhen Khong,
Tamer Başar,
Karl H. Johansson,
Li Qiu
Abstract:
Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction a…
▽ More
Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction and multiport network theory. For signed Laplacians that are indefinite, we characterize their inertias with the same framework. Furthermore, we build connections between signed Laplacians, generalized M-matrices, and eventually exponentially positive matrices.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Stabilization of Cascaded Two-Port Networked Systems with Simultaneous Nonlinear Uncertainties
Authors:
Di Zhao,
Sei Zhen Khong,
Li Qiu
Abstract:
We introduce a versatile framework to model and study networked control systems (NCSs). An NCS is described as a feedback interconnection of a plant and a controller communicating through a bidirectional channel modelled by cascaded nonlinear two-port networks. This model is sufficiently rich to capture various properties of a real-world communication channel, such as distortion, interference, and…
▽ More
We introduce a versatile framework to model and study networked control systems (NCSs). An NCS is described as a feedback interconnection of a plant and a controller communicating through a bidirectional channel modelled by cascaded nonlinear two-port networks. This model is sufficiently rich to capture various properties of a real-world communication channel, such as distortion, interference, and nonlinearity. Uncertainties in the plant, controller and communication channels can be handled simultaneously in the framework. We provide a necessary and sufficient condition for the robust finite-gain stability of an NCS when the model uncertainties in the plant and controller are measured by the gap metric and those in the nonlinear communication channels are measured by operator norms of the uncertain elements. This condition is given by an inequality involving "arcsine" of the uncertainty bounds and is derived from novel geometric insights underlying the robustness of a standard closed-loop system in the presence of conelike nonlinear perturbations on the system graphs.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.