Search | arXiv e-print repository

MVMS-RCN: A Dual-Domain Unfolding CT Reconstruction with Multi-sparse-view and Multi-scale Refinement-correction

Authors: Xiaohong Fan, Ke Chen, Huaming Yi, Yin Yang, Jian** Zhang

Abstract: X-ray Computed Tomography (CT) is one of the most important diagnostic imaging techniques in clinical applications. Sparse-view CT imaging reduces the number of projection views to a lower radiation dose and alleviates the potential risk of radiation exposure. Most existing deep learning (DL) and deep unfolding sparse-view CT reconstruction methods: 1) do not fully use the projection data; 2) do n… ▽ More X-ray Computed Tomography (CT) is one of the most important diagnostic imaging techniques in clinical applications. Sparse-view CT imaging reduces the number of projection views to a lower radiation dose and alleviates the potential risk of radiation exposure. Most existing deep learning (DL) and deep unfolding sparse-view CT reconstruction methods: 1) do not fully use the projection data; 2) do not always link their architecture designs to a mathematical theory; 3) do not flexibly deal with multi-sparse-view reconstruction assignments. This paper aims to use mathematical ideas and design optimal DL imaging algorithms for sparse-view tomography reconstructions. We propose a novel dual-domain deep unfolding unified framework that offers a great deal of flexibility for multi-sparse-view CT reconstruction with different sampling views through a single model. This framework combines the theoretical advantages of model-based methods with the superior reconstruction performance of DL-based methods, resulting in the expected generalizability of DL. We propose a refinement module that utilizes unfolding projection domain to refine full-sparse-view projection errors, as well as an image domain correction module that distills multi-scale geometric error corrections to reconstruct sparse-view CT. This provides us with a new way to explore the potential of projection information and a new perspective on designing network architectures. All parameters of our proposed framework are learnable end to end, and our method possesses the potential to be applied to plug-and-play reconstruction. Extensive experiments demonstrate that our framework is superior to other existing state-of-the-art methods. Our source codes are available at https://github.com/fanxiaohong/MVMS-RCN. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 12 pages, submitted

arXiv:2402.19111 [pdf, other]

Deep Network for Image Compressed Sensing Coding Using Local Structural Sampling

Authors: Wenxue Cui, Xingtao Wang, Xiaopeng Fan, Shaohui Liu, Xinwei Gao, Debin Zhao

Abstract: Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods gen… ▽ More Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods generally maintain a much higher computational complexity. In this paper, we propose a new CNN based image CS coding framework using local structural sampling (dubbed CSCNet) that includes three functional modules: local structural sampling, measurement coding and Laplacian pyramid reconstruction. In the proposed framework, instead of GRM, a new local structural sampling matrix is first developed, which is able to enhance the correlation between the measurements through a local perceptual sampling strategy. Besides, the designed local structural sampling matrix can be jointly optimized with the other functional modules during training process. After sampling, the measurements with high correlations are produced, which are then coded into final bitstreams by the third-party image codec. At last, a Laplacian pyramid reconstruction network is proposed to efficiently recover the target image from the measurement domain to the image domain. Extensive experimental results demonstrate that the proposed scheme outperforms the existing state-of-the-art CS coding methods, while maintaining fast computational speed. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by ACM Transactions on Multimedia Computing Communications and Applications (TOMM)

arXiv:2401.05709 [pdf, other]

Probability-based Distance Estimation Model for 3D DV-Hop Localization in WSNs

Authors: Penghong Wang, Hao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao

Abstract: Localization is one of the pivotal issues in wireless sensor network applications. In 3D localization studies, most algorithms focus on enhancing the location prediction process, lacking theoretical derivation of the detection distance of an anchor node at the varying hops, engenders a localization performance bottleneck. To address this issue, we propose a probability-based average distance estim… ▽ More Localization is one of the pivotal issues in wireless sensor network applications. In 3D localization studies, most algorithms focus on enhancing the location prediction process, lacking theoretical derivation of the detection distance of an anchor node at the varying hops, engenders a localization performance bottleneck. To address this issue, we propose a probability-based average distance estimation (PADE) model that utilizes the probability distribution of node distances detected by an anchor node. The aim is to mathematically derive the average distances of nodes detected by an anchor node at different hops. First, we develop a probability-based maximum distance estimation (PMDE) model to calculate the upper bound of the distance detected by an anchor node. Then, we present the PADE model, which relies on the upper bound obtained of the distance by the PMDE model. Finally, the obtained average distance is used to construct a distance loss function, and it is embedded with the traditional distance loss function into a multi-objective genetic algorithm to predict the locations of unknown nodes. The experimental results demonstrate that the proposed method achieves state-of-the-art performance in random and multimodal distributed sensor networks. The average localization accuracy is improved by 3.49\%-12.66\% and 3.99%-22.34%, respectively. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.02662 [pdf, other]

GainNet: Coordinates the Odd Couple of Generative AI and 6G Networks

Authors: Ning Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani

Abstract: The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-of-everything to the Internet-of-intelligence with hybrid heterogeneous network architectures. In the future, the interplay between GAI and the 6G will lead to new opportunities, where GAI can learn th… ▽ More The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-of-everything to the Internet-of-intelligence with hybrid heterogeneous network architectures. In the future, the interplay between GAI and the 6G will lead to new opportunities, where GAI can learn the knowledge of personalized data from the massive connected 6G end devices, while GAI's powerful generation ability can provide advanced network solutions for 6G network and provide 6G end devices with various AIGC services. However, they seem to be an odd couple, due to the contradiction of data and resources. To achieve a better-coordinated interplay between GAI and 6G, the GAI-native networks (GainNet), a GAI-oriented collaborative cloud-edge-end intelligence framework, is proposed in this paper. By deeply integrating GAI with 6G network design, GainNet realizes the positive closed-loop knowledge flow and sustainable-evolution GAI model optimization. On this basis, the GAI-oriented generic resource orchestration mechanism with integrated sensing, communication, and computing (GaiRom-ISCC) is proposed to guarantee the efficient operation of GainNet. Two simple case studies demonstrate the effectiveness and robustness of the proposed schemes. Finally, we envision the key challenges and future directions concerning the interplay between GAI models and 6G networks. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures, 1 table

arXiv:2312.15668 [pdf, ps, other]

Air-to-Ground Communications Beyond 5G: UAV Swarm Formation Control and Tracking

Authors: Xiao Fan, Peiran Wu, Minghua Xia

Abstract: Unmanned aerial vehicle (UAV) communications have been widely accepted as promising technologies to support air-to-ground communications in the forthcoming sixth-generation (6G) wireless networks. This paper proposes a novel air-to-ground communication model consisting of aerial base stations served by UAVs and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-poi… ▽ More Unmanned aerial vehicle (UAV) communications have been widely accepted as promising technologies to support air-to-ground communications in the forthcoming sixth-generation (6G) wireless networks. This paper proposes a novel air-to-ground communication model consisting of aerial base stations served by UAVs and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-point (CoMP) transmission with the theory of stochastic geometry. In particular, a CoMP set consisting of multiple UAVs is developed based on the theory of Poisson-Delaunay tetrahedralization. Effective UAV formation control and UAV swarm tracking schemes for two typical scenarios, including static and mobile UEs, are also developed using the multi-agent system theory to ensure that collaborative UAVs can efficiently reach target spatial positions for mission execution. Thanks to the ease of mathematical tractability, this model provides explicit performance expressions for a typical UE's coverage probability and achievable ergodic rate. Extensive simulation and numerical results corroborate that the proposed scheme outperforms UAV communications without CoMP transmission and obtains similar performance to the conventional CoMP scheme while avoiding search overhead. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures, to appear in IEEE TWC

arXiv:2312.08743 [pdf, other]

FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments

Authors: Minghao Lu, Xiyu Fan, Han Chen, Peng Lu

Abstract: Obstacle avoidance for Unmanned Aerial Vehicles (UAVs) in cluttered environments is significantly challenging. Existing obstacle avoidance for UAVs either focuses on fully static environments or static environments with only a few dynamic objects. In this paper, we take the initiative to consider the obstacle avoidance of UAVs in dynamic cluttered environments in which dynamic objects are the domi… ▽ More Obstacle avoidance for Unmanned Aerial Vehicles (UAVs) in cluttered environments is significantly challenging. Existing obstacle avoidance for UAVs either focuses on fully static environments or static environments with only a few dynamic objects. In this paper, we take the initiative to consider the obstacle avoidance of UAVs in dynamic cluttered environments in which dynamic objects are the dominant objects. This type of environment poses significant challenges to both perception and planning. Multiple dynamic objects possess various motions, making it extremely difficult to estimate and predict their motions using one motion model. The planning must be highly efficient to avoid cluttered dynamic objects. This paper proposes Fast and Adaptive Perception and Planning (FAPP) for UAVs flying in complex dynamic cluttered environments. A novel and efficient point cloud segmentation strategy is proposed to distinguish static and dynamic objects. To address multiple dynamic objects with different motions, an adaptive estimation method with covariance adaptation is proposed to quickly and accurately predict their motions. Our proposed trajectory optimization algorithm is highly efficient, enabling it to avoid fast objects. Furthermore, an adaptive re-planning method is proposed to address the case when the trajectory optimization cannot find a feasible solution, which is common for dynamic cluttered environments. Extensive validations in both simulation and real-world experiments demonstrate the effectiveness of our proposed system for highly dynamic and cluttered environments. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.07758 [pdf, other]

Synchrophasor Data Anomaly Detection on Grid Edge by 5G Communication and Adjacent Compute

Authors: Chuan Qin, Dexin Wang, Kishan Prudhvi Guddanti, Xiaoyuan Fan, Zhangshuan Hou

Abstract: The fifth-generation mobile communication (5G) technology offers opportunities to enhance the real-time monitoring of grids. The 5G-enabled phasor measurement units (PMUs) feature flexible positioning and cost-effective long-term maintenance without the constraints of fixing wires. This paper is the first to demonstrate the applicability of 5G in PMU communication, and the experiment was carried o… ▽ More The fifth-generation mobile communication (5G) technology offers opportunities to enhance the real-time monitoring of grids. The 5G-enabled phasor measurement units (PMUs) feature flexible positioning and cost-effective long-term maintenance without the constraints of fixing wires. This paper is the first to demonstrate the applicability of 5G in PMU communication, and the experiment was carried out at Verizon non-standalone test-bed at Pacific Northwest National Laboratory (PNNL) Advanced Wireless Communication lab. The performance of the 5G-enabled PMU communication setup is reviewed and discussed in this paper, and a generalized dynamic linear model (GDLM) based real-time synchrophasor data anomaly detection use-case is presented. Last but not least, the practicability of implementing 5G for wide-area protection strategies is explored and discussed by analyzing the experimental results. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 5 pages, 4 figures

arXiv:2311.03815 [pdf, other]

Integrated Sensing, Communication, and Computing for Cost-effective Multimodal Federated Perception

Authors: Ning Chen, Zhipeng Cheng, Xuwei Fan, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani

Abstract: Federated learning (FL) is a classic paradigm of 6G edge intelligence (EI), which alleviates privacy leaks and high communication pressure caused by traditional centralized data processing in the artificial intelligence of things (AIoT). The implementation of multimodal federated perception (MFP) services involves three sub-processes, including sensing-based multimodal data generation, communicati… ▽ More Federated learning (FL) is a classic paradigm of 6G edge intelligence (EI), which alleviates privacy leaks and high communication pressure caused by traditional centralized data processing in the artificial intelligence of things (AIoT). The implementation of multimodal federated perception (MFP) services involves three sub-processes, including sensing-based multimodal data generation, communication-based model transmission, and computing-based model training, ultimately relying on available underlying multi-domain physical resources such as time, frequency, and computing power. How to reasonably coordinate the multi-domain resources scheduling among sensing, communication, and computing, therefore, is crucial to the MFP networks. To address the above issues, this paper investigates service-oriented resource management with integrated sensing, communication, and computing (ISCC). With the incentive mechanism of the MFP service market, the resources management problem is redefined as a social welfare maximization problem, where the idea of "expanding resources" and "reducing costs" is used to improve learning performance gain and reduce resource costs. Experimental results demonstrate the effectiveness and robustness of the proposed resource scheduling mechanisms. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.08804 [pdf, other]

Spiking Semantic Communication for Feature Transmission with HARQ

Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

Abstract: In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that… ▽ More In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that Spiking Neural Network (SNN)-based SC models exhibit greater robustness on digital channels compared to Deep Neural Network (DNN)-based SC models. However, the existing SNN-based SC models require fixed time steps, resulting in fixed transmission bandwidths that cannot be adaptively adjusted based on channel conditions. To address this issue, this paper introduces a novel SC model called SNN-SC-HARQ, which combines the SNN-based SC model with the Hybrid Automatic Repeat Request (HARQ) mechanism. SNN-SC-HARQ comprises an SNN-based SC model that supports the transmission of features at varying bandwidths, along with a policy model that determines the appropriate bandwidth. Experimental results show that SNN-SC-HARQ can dynamically adjust the bandwidth according to the channel conditions without performance loss. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.06291 [pdf, other]

Three-Dimensional Medical Image Fusion with Deformable Cross-Attention

Authors: Lin Liu, Xinxin Fan, Chulong Zhang, **g**g Dai, Yaoqin Xie, Xiaokun Liang

Abstract: Multimodal medical image fusion plays an instrumental role in several areas of medical image processing, particularly in disease recognition and tumor detection. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. However, this approach often neglects the fundamental commonalities and disparities between multimod… ▽ More Multimodal medical image fusion plays an instrumental role in several areas of medical image processing, particularly in disease recognition and tumor detection. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. However, this approach often neglects the fundamental commonalities and disparities between multimodal information. Furthermore, the prevailing methodologies are largely confined to fusing two-dimensional (2D) medical image slices, leading to a lack of contextual supervision in the fusion images and subsequently, a decreased information yield for physicians relative to three-dimensional (3D) images. In this study, we introduce an innovative unsupervised feature mutual learning fusion network designed to rectify these limitations. Our approach incorporates a Deformable Cross Feature Blend (DCFB) module that facilitates the dual modalities in discerning their respective similarities and differences. We have applied our model to the fusion of 3D MRI and PET images obtained from 660 patients in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Through the application of the DCFB module, our network generates high-quality MRI-PET fusion images. Experimental results demonstrate that our method surpasses traditional 2D image fusion methods in performance metrics such as Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). Importantly, the capacity of our method to fuse 3D images enhances the information available to physicians and researchers, thus marking a significant step forward in the field. The code will soon be available online. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.04992 [pdf, other]

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.07524 [pdf, other]

doi 10.1109/TGRS.2024.3368760

A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing

Authors: Yujie Feng, Yin Yang, Xiaohong Fan, Zhengpeng Zhang, Jian** Zhang

Abstract: Remote sensing images are essential for many applications of the earth's sciences, but their quality can usually be degraded due to limitations in sensor technology and complex imaging environments. To address this, various remote sensing image deblurring methods have been developed to restore sharp and high-quality images from degraded observational data. However, most traditional model-based deb… ▽ More Remote sensing images are essential for many applications of the earth's sciences, but their quality can usually be degraded due to limitations in sensor technology and complex imaging environments. To address this, various remote sensing image deblurring methods have been developed to restore sharp and high-quality images from degraded observational data. However, most traditional model-based deblurring methods usually require predefined {hand-crafted} prior assumptions, which are difficult to handle in complex applications. On the other hand, deep learning-based deblurring methods are often considered as black boxes, lacking transparency and interpretability. In this work, we propose a new blind deblurring learning framework that utilizes alternating iterations of shrinkage thresholds. This framework involves updating blurring kernels and images, with a theoretical foundation in network design. Additionally, we propose a learnable blur kernel proximal map** module to improve the accuracy of the blur kernel reconstruction. Furthermore, we propose a deep proximal map** module in the image domain, which combines a generalized shrinkage threshold with a multi-scale prior feature extraction block. This module also incorporates an attention mechanism to learn adaptively the importance of prior information, improving the flexibility and robustness of prior terms, and avoiding limitations similar to hand-crafted image prior terms. Consequently, we design a novel multi-scale generalized shrinkage threshold network (MGSTNet) that focuses specifically on learning deep geometric prior features to enhance image restoration. Experimental results on real and synthetic remote sensing image datasets demonstrate the superiority of our MGSTNet framework compared to existing deblurring methods. △ Less

Submitted 21 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 16 pages,Accepted to IEEE Transactions on Geoscience and Remote Sensing,2024

MSC Class: 54H30; 68U10; 94A08

Journal ref: IEEE Transactions on Geoscience and Remote Sensing,2024

arXiv:2309.07136 [pdf, other]

Masked Transformer for Electrocardiogram Classification

Authors: Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao

Abstract: Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Trans… ▽ More Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlap** segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks. △ Less

Submitted 22 April, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

Comments: more experimental results; more implementation details; different abstracts

arXiv:2309.05226 [pdf, other]

Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks

Authors: Xilai Fan, Ya-Feng Liu, Bo Jiang

Abstract: In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We… ▽ More In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We first establish the equivalence between the considered problem and its semidefinite relaxation (SDR). Then we further derive the partial Lagrangian dual of the SDR problem and show that the objective function of the obtained dual problem is differentiable. Based on the differentiability, we propose two efficient projected gradient ascent algorithms for solving the dual problem, which are projected exact gradient ascent (PEGA) and projected inexact gradient ascent (PIGA). While PEGA is guaranteed to find the global solution of the dual problem (and hence the global solution of the original problem), PIGA is more computationally efficient due to the lower complexity in inexactly computing the gradient. Global optimality and high efficiency of the proposed algorithms are demonstrated via numerical experiments. △ Less

Submitted 23 December, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2024

arXiv:2309.04171 [pdf, other]

PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded Diffraction Patterns Phase Retrieval

Authors: Aoxu Liu, Xiaohong Fan, Yin Yang, Jian** Zhang

Abstract: The problem of phase retrieval (PR) involves recovering an unknown image from limited amplitude measurement data and is a challenge nonlinear inverse problem in computational imaging and image processing. However, many of the PR methods are based on black-box network models that lack interpretability and plug-and-play (PnP) frameworks that are computationally complex and require careful parameter… ▽ More The problem of phase retrieval (PR) involves recovering an unknown image from limited amplitude measurement data and is a challenge nonlinear inverse problem in computational imaging and image processing. However, many of the PR methods are based on black-box network models that lack interpretability and plug-and-play (PnP) frameworks that are computationally complex and require careful parameter tuning. To address this, we have developed PRISTA-Net, a deep unfolding network (DUN) based on the first-order iterative shrinkage thresholding algorithm (ISTA). This network utilizes a learnable nonlinear transformation to address the proximal-point map** sub-problem associated with the sparse priors, and an attention mechanism to focus on phase information containing image edges, textures, and structures. Additionally, the fast Fourier transform (FFT) is used to learn global features to enhance local information, and the designed logarithmic-based loss function leads to significant improvements when the noise level is low. All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold parameters, and step size, are learned end-to-end instead of being manually set. This method combines the interpretability of traditional methods with the fast inference ability of deep learning and is able to handle noise at each iteration during the unfolding stage, thus improving recovery quality. Experiments on Coded Diffraction Patterns (CDPs) measurements demonstrate that our approach outperforms the existing state-of-the-art methods in terms of qualitative and quantitative evaluations. Our source codes are available at \emph{https://github.com/liuaxou/PRISTA-Net}. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 12 pages

arXiv:2308.05880 [pdf, other]

Smart Data Map** for Connecting Power System Model and Geospatial Data

Authors: Xue Li, Kishan Prudhvi Guddanti, Samrat Acharya, Patrick Royer, Xiaoyuan Fan, Marcelo Elizondo

Abstract: Knowing the geospatial locations of power system model elements and linking load models with end users and their communities are the foundation for analyzing system resilience and vulnerability to natural hazards. However, power system models and geospatial data for power grid assets are often developed asynchronously without close coordination. Creating a direct map** between the two is a chall… ▽ More Knowing the geospatial locations of power system model elements and linking load models with end users and their communities are the foundation for analyzing system resilience and vulnerability to natural hazards. However, power system models and geospatial data for power grid assets are often developed asynchronously without close coordination. Creating a direct map** between the two is a challenging task, mainly due to heterogeneous data structures, target uses, historical legacies, and human errors. This work aims to build an automatic data map** workflow to connect the two, and to support energy grid resilience studies for Puerto Rico. The primary steps in this workflow include constructing graphs using geospatial data, and aligning them to the transmission networks defined in the power system data. The results have been evaluated against existing manual map** practices for part of the Puerto Rico Power Grid model to illustrate the performance of such auto-map** solutions. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: 5 pages, i figure, 1 table

arXiv:2308.03807 [pdf, other]

doi 10.1109/TCI.2023.3315853

Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction

Authors: Xiaohong Fan, Yin Yang, Ke Chen, Yujie Feng, Jian** Zhang

Abstract: Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to incre… ▽ More Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on the second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high-/low-frequency image features, but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction. Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration, and post-processing. In the image restoration step, a cascade geometric incremental learning module is designed to compensate for missing texture information from different geometric spectral decomposition domains. Inspired by the overlap-tile strategy, we also develop a post-processing module to remove the block effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable, an adaptive initialization technique of physical parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL. △ Less

Submitted 11 October, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

Comments: 15 pages,our source codes are available at https://github.com/fanxiaohong/Nest-DGIL

Journal ref: This work is published in IEEE Transactions on Computational Imaging, vol. 9, pp. 819-833, 2023

arXiv:2307.02701 [pdf]

Touch, press and stroke: a soft capacitive sensor skin

Authors: Mirza S. Sarwar, Ryusuke Ishizaki, Kieran Morton, Claire Preston, Tan Nguyen, Xu Fan, Bertille Dupont, Leanna Hogarth, Takahide Yoshiike, Shahriar Mirabbasi, John D. W. Madden

Abstract: Soft sensors that can discriminate shear and normal force could help provide machines the fine control desirable for safe and effective physical interactions with people. A capacitive sensor is made for this purpose, composed of patterned elastomer and containing both fixed and sliding pillars that allow the sensor to deform and buckle, much like skin itself. The sensor differentiates between simu… ▽ More Soft sensors that can discriminate shear and normal force could help provide machines the fine control desirable for safe and effective physical interactions with people. A capacitive sensor is made for this purpose, composed of patterned elastomer and containing both fixed and sliding pillars that allow the sensor to deform and buckle, much like skin itself. The sensor differentiates between simultaneously applied pressure and shear. In addition, finger proximity is detectable up to 15 mm, with a pressure and shear sensitivity of 1 kPa and a displacement resolution of 50 $μ$m. The operation is demonstrated on a simple gripper holding a cup. The combination of features and the straightforward fabrication method make this sensor a candidate for implementation as a sensing skin for humanoid robotics applications. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 9 pages, 5 figures, submitted to Scientific Reports Nature

arXiv:2306.13962 [pdf, other]

QoS-based Beamforming and Compression Design for Cooperative Cellular Networks via Lagrangian Duality

Authors: Xilai Fan, Ya-Feng Liu, Liang Liu, Tsung-Hui Chang

Abstract: This paper considers the quality-of-service (QoS)-based joint beamforming and compression design problem in the downlink cooperative cellular network, where multiple relay-like base stations (BSs), connected to the central processor via rate-limited fronthaul links, cooperatively transmit messages to the users. The problem of interest is formulated as the minimization of the total transmit power o… ▽ More This paper considers the quality-of-service (QoS)-based joint beamforming and compression design problem in the downlink cooperative cellular network, where multiple relay-like base stations (BSs), connected to the central processor via rate-limited fronthaul links, cooperatively transmit messages to the users. The problem of interest is formulated as the minimization of the total transmit power of the BSs, subject to all users' signal-to-interference-plus-noise ratio (SINR) constraints and all BSs' fronthaul rate constraints. In this paper, we first show that there is no duality gap between the considered joint optimization problem and its Lagrangian dual by showing the tightness of its semidefinite relaxation (SDR). Then, we propose an efficient algorithm based on the above duality result for solving the considered problem. The proposed algorithm judiciously exploits the special structure of an enhanced Karush-Kuhn-Tucker (KKT) conditions of the considered problem and finds the solution that satisfies the enhanced KKT conditions via two fixed point iterations. Two key features of the proposed algorithm are: (1) it is able to detect whether the considered problem is feasible or not and find its globally optimal solution when it is feasible; (2) it is highly efficient because both of the fixed point iterations in the proposed algorithm are linearly convergent and evaluating the functions in the fixed point iterations are computationally cheap. Numerical results show the global optimality and efficiency of the proposed algorithm. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 15 pages, 7 figures, submitted for possible publication

arXiv:2305.12701 [pdf, other]

More Perspectives Mean Better: Underwater Target Recognition and Localization with Multimodal Data via Symbiotic Transformer and Multiview Regression

Authors: Shipei Liu, Xiaoya Fan, Guowei Wu

Abstract: Underwater acoustic target recognition (UATR) and localization (UATL) play important roles in marine exploration. The highly noisy acoustic signal and time-frequency interference among various sources pose big challenges to this task. To tackle these issues, we propose a multimodal approach to extract and fuse audio-visual-textual information to recognize and localize underwater targets through th… ▽ More Underwater acoustic target recognition (UATR) and localization (UATL) play important roles in marine exploration. The highly noisy acoustic signal and time-frequency interference among various sources pose big challenges to this task. To tackle these issues, we propose a multimodal approach to extract and fuse audio-visual-textual information to recognize and localize underwater targets through the designed Symbiotic Transformer (Symb-Trans) and Multi-View Regression (MVR) method. The multimodal data were first preprocessed by a custom-designed HetNorm module to normalize the multi-source data in a common feature space. The Symb-Trans module embeds audiovisual features by co-training the preprocessed multimodal features through parallel branches and a content encoder with cross-attention. The audiovisual features are then used for underwater target recognition. Meanwhile, the text embedding combined with the audiovisual features is fed to an MVR module to predict the localization of the underwater targets through multi-view clustering and multiple regression. Since no off-the-shell multimodal dataset is available for UATR and UATL, we combined multiple public datasets, consisting of acoustic, and/or visual, and/or textural data, to obtain audio-visual-textual triplets for model training and validation. Experiments show that our model outperforms comparative methods in 91.7% (11 out of 12 metrics) and 100% (4 metrics) of the quantitative metrics for the recognition and localization tasks, respectively. In a case study, we demonstrate the advantages of multi-view models in establishing sample discriminability through visualization methods. For UATL, the proposed MVR method produces the relation graphs, which allow predictions based on records of underwater targets with similar conditions. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2303.11661 [pdf, other]

Advanced Multi-Microscopic Views Cell Semi-supervised Segmentation

Authors: Fang Hu, Xuexue Sun, Ke Qing, Fenxi Xiao, Zhi Wang, Xiaolu Fan

Abstract: Although deep learning (DL) shows powerful potential in cell segmentation tasks, it suffers from poor generalization as DL-based methods originally simplified cell segmentation in detecting cell membrane boundary, lacking prominent cellular structures to position overall differentiating. Moreover, the scarcity of annotated cell images limits the performance of DL models. Segmentation limitations o… ▽ More Although deep learning (DL) shows powerful potential in cell segmentation tasks, it suffers from poor generalization as DL-based methods originally simplified cell segmentation in detecting cell membrane boundary, lacking prominent cellular structures to position overall differentiating. Moreover, the scarcity of annotated cell images limits the performance of DL models. Segmentation limitations of a single category of cell make massive practice difficult, much less, with varied modalities. In this paper, we introduce a novel semi-supervised cell segmentation method called Multi-Microscopic-view Cell semi-supervised Segmentation (MMCS), which can train cell segmentation models utilizing less labeled multi-posture cell images with different microscopy well. Technically, MMCS consists of Nucleus-assisted global recognition, Self-adaptive diameter filter, and Temporal-ensembling models. Nucleus-assisted global recognition adds additional cell nucleus channel to improve the global distinguishing performance of fuzzy cell membrane boundaries even when cells aggregate. Besides, self-adapted cell diameter filter can help separate multi-resolution cells with different morphology properly. It further leverages the temporal-ensembling models to improve the semi-supervised training process, achieving effective training with less labeled data. Additionally, optimizing the weight of unlabeled loss contributed to total loss also improve the model performance. Evaluated on the Tuning Set of NeurIPS 2022 Cell Segmentation Challenge (NeurIPS CellSeg), MMCS achieves an F1-score of 0.8239 and the running time for all cases is within the time tolerance. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2302.12368 [pdf, other]

Power System Recovery Coordinated with (Non-)Black-Start Generators

Authors: Meng Zhao, Patrick R. Maloney, Xinda Ke, Juan Carlos Bedoya Ceballos, Xiaoyuan Fan, Marcelo A. Elizondo

Abstract: Power restoration is an urgent task after a black-out, and recovery efficiency is critical when quantifying system resilience. Multiple elements should be considered to restore the power system quickly and safely. This paper proposes a recovery model to solve a direct-current optimal power flow (DCOPF) based on mixed-integer linear programming (MILP). Since most of the generators cannot start inde… ▽ More Power restoration is an urgent task after a black-out, and recovery efficiency is critical when quantifying system resilience. Multiple elements should be considered to restore the power system quickly and safely. This paper proposes a recovery model to solve a direct-current optimal power flow (DCOPF) based on mixed-integer linear programming (MILP). Since most of the generators cannot start independently, the interaction between black-start (BS) and non-black-start (NBS) generators must be modeled appropriately. The energization status of the NBS is coordinated with the recovery status of transmission lines, and both of them are modeled as binary variables. Also, only after an NBS unit receives cranking power through connected transmission lines, will it be allowed to participate in the following system dispatch. The amount of cranking power is estimated as a fixed proportion of the maximum generation capacity. The proposed model is validated on several test systems, as well as a 1393-bus representation system of the Puerto Rican electric power grid. Test results demonstrate how the recovery of NBS units and damaged transmission lines can be optimized, resulting in an efficient and well-coordinated recovery procedure. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 5 pages, 6 figures

arXiv:2211.03545 [pdf, other]

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Authors: Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu

Abstract: Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We prop… ▽ More Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. △ Less

Submitted 4 December, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.00150 [pdf, other]

A 5G Enabled Adaptive Computing Workflow for Greener Power Grid

Authors: Yousu Chen, Liwei Wang, Xiaoyuan Fan, Dexin Wang, James Ogle

Abstract: 5G wireless technology can deliver higher data speeds, ultra low latency, more reliability, massive network capacity, increased availability, and a more uniform user experience to users. It brings additional power to help address the challenges brought by renewable integration and decarbonization. In this paper, a 5G enabled adaptive computing workflow has been presented that consists of various c… ▽ More 5G wireless technology can deliver higher data speeds, ultra low latency, more reliability, massive network capacity, increased availability, and a more uniform user experience to users. It brings additional power to help address the challenges brought by renewable integration and decarbonization. In this paper, a 5G enabled adaptive computing workflow has been presented that consists of various computing resources, such as 5G equipment, edge computing, cluster, Graphics processing unit (GPU) and cloud computing, with two examples showing technical feasibility for edge-grid-cloud interaction for power system real-time monitoring, security assessment, and forecasting. Benefiting from the high speed data transport and massive connection capability of 5G, the workflow shows its potential to seamlessly integrate various applications at distributed and/or centralized locations to build more complex and powerful functions, with better flexibility. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: 5 pages, 6 figures

arXiv:2210.16705 [pdf, other]

Distributed Swarm Learning for Internet of Things at the Edge: Where Artificial Intelligence Meets Biological Intelligence

Authors: Yue Wang, Zhi Tian, Xin Fan, Yan Huo, Cameron Nowzari, Kai Zeng

Abstract: With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm. Due to device restrictions and resource constraints, edge learning among massive IoT devices faces major technical challenges cause… ▽ More With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm. Due to device restrictions and resource constraints, edge learning among massive IoT devices faces major technical challenges caused by the communication bottleneck, data and device heterogeneity, non-convex optimization, privacy and security concerns, and dynamic environments. To overcome these challenges, this article studies a new framework of distributed swarm learning (DSL) through a holistic integration of artificial intelligence and biological swarm intelligence. Leveraging efficient and robust signal processing and communication techniques, DSL contributes to novel tools for learning and optimization tailored for real-time operations of large-scale IoT in edge wireless environments, which will benefit a wide range of edge IoT applications. △ Less

Submitted 29 October, 2022; originally announced October 2022.

arXiv:2210.06836 [pdf, other]

SNN-SC: A Spiking Semantic Communication Framework for Feature Transmission

Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan, Yonghong Tian

Abstract: In Collaborative Intelligence (CI), Artificial Intelligence (AI) models are split between edge devices and cloud. Features extracted from input on edge devices are transmitted to the cloud for subsequent tasks. Extracting task-related and compact information is critical when transmission bandwidth is limited. In this paper, we propose a task-oriented Semantic Communication (SC) framework (SNN-SC)… ▽ More In Collaborative Intelligence (CI), Artificial Intelligence (AI) models are split between edge devices and cloud. Features extracted from input on edge devices are transmitted to the cloud for subsequent tasks. Extracting task-related and compact information is critical when transmission bandwidth is limited. In this paper, we propose a task-oriented Semantic Communication (SC) framework (SNN-SC) to address this problem. In SC, only important information for downstream tasks is extracted and transmitted. However, most of the existing SC works only transmit analog information on the AWGN channel and cannot be directly used for digital channels. SNN-SC fills this gap by using Spiking Neural Networks (SNNs) to extract and transmit semantic information. Since the outputs of SNNs are binary spikes, SNN-SC can be directly applied to digital channels. In SNN-SC, a new spiking neuron is proposed to help the cloud recover binary semantic information into informative floating-point features. Furthermore, we improve the performance of SNN-SC by maximizing the entropy of the semantic information. We evaluate the performance of SNN-SC on different collaborative classification models, digital channels, and bandwidths. The experimental results show that SNN-SC is more robust than the CNN-based SC framework and separate source and channel coding method on digital channels. △ Less

Submitted 17 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2208.05578 [pdf, other]

CB-DSL: Communication-efficient and Byzantine-robust Distributed Swarm Learning on Non-i.i.d. Data

Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

Abstract: The valuable data collected by IoT devices in edge networks together with the resurgence of ML stimulate the latest trend of edge AI. However, recent FL methods face major challenges including communication bottleneck, data heterogeneity and security concerns in edge IoT scenarios, especially when being adopted for distributed learning among massive IoT devices equipped with limited data and trans… ▽ More The valuable data collected by IoT devices in edge networks together with the resurgence of ML stimulate the latest trend of edge AI. However, recent FL methods face major challenges including communication bottleneck, data heterogeneity and security concerns in edge IoT scenarios, especially when being adopted for distributed learning among massive IoT devices equipped with limited data and transmission resources. Meanwhile, the swarm nature of IoT systems is overlooked by most existing literature, which calls for new designs of distributed learning algorithms. Inspired by the success of biological intelligence (BI) of gregarious organisms, we propose a novel edge learning approach for swarm IoT, called communication-efficient and Byzantine-robust distributed swarm learning (CB-DSL), through a holistic integration of AI-enabled stochastic gradient descent and BI-enabled particle swarm optimization. To deal with non-i.i.d. data issues and Byzantine attacks, global data samples are introduced in CB-DSL and shared among IoT workers, which not only alleviates the local data heterogeneity effectively but also enables to fully utilize the exploration-exploitation mechanism of swarm intelligence. Further, we provide convergence analysis to theoretically demonstrate that the proposed CB-DSL is superior to the standard FL with better convergence behavior. In addition, to measure the effectiveness of the introduction of the globally shared dataset, we also evaluate the model divergence by deriving its upper bound, which is related to the distance between the data distribution at local IoT devices and the population distribution for the whole datasets. Numerical results verify that the proposed CB-DSL outperforms the existing benchmarks in terms of faster convergence speed, higher convergent accuracy, lower communication cost, and better robustness against non-i.i.d. data and Byzantine attacks. △ Less

Submitted 20 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: update theoretical and simulation results

arXiv:2207.04213 [pdf, other]

Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction

Authors: Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson

Abstract: Audio-visual target speech extraction, which aims to extract a certain speaker's speech from the noisy mixture by looking at lip movements, has made significant progress combining time-domain speech separation models and visual feature extractors (CNN). One problem of fusing audio and video information is that they have different time resolutions. Most current research upsamples the visual feature… ▽ More Audio-visual target speech extraction, which aims to extract a certain speaker's speech from the noisy mixture by looking at lip movements, has made significant progress combining time-domain speech separation models and visual feature extractors (CNN). One problem of fusing audio and video information is that they have different time resolutions. Most current research upsamples the visual features along the time dimension so that audio and video features are able to align in time. However, we believe that lip movement should mostly contain long-term, or phone-level information. Based on this assumption, we propose a new way to fuse audio-visual features. We observe that for DPRNN \cite{dprnn}, the interchunk dimension's time resolution could be very close to the time resolution of video frames. Like \cite{sepformer}, the LSTM in DPRNN is replaced by intra-chunk and inter-chunk self-attention, but in the proposed algorithm, inter-chunk attention incorporates the visual features as an additional feature stream. This prevents the upsampling of visual cues, resulting in more efficient audio-visual fusion. The result shows we achieve superior results compared with other time-domain based audio-visual fusion models. △ Less

Submitted 3 March, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: Paper Accepted by ICASSP2023

arXiv:2206.07008 [pdf, other]

doi 10.1109/LSP.2022.3184251

Constellation Design for Deep Joint Source-Channel Coding

Authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan

Abstract: Deep learning-based joint source-channel coding (JSCC) has shown excellent performance in image and feature transmission. However, the output values of the JSCC encoder are continuous, which makes the constellation of modulation complex and dense. It is hard and expensive to design radio frequency chains for transmitting such full-resolution constellation points. In this paper, two methods of mapp… ▽ More Deep learning-based joint source-channel coding (JSCC) has shown excellent performance in image and feature transmission. However, the output values of the JSCC encoder are continuous, which makes the constellation of modulation complex and dense. It is hard and expensive to design radio frequency chains for transmitting such full-resolution constellation points. In this paper, two methods of map** the full-resolution constellation to finite constellation are proposed for real system implementation. The constellation map** results of the proposed methods correspond to regular constellation and irregular constellation, respectively. We apply the methods to existing deep JSCC models and evaluate them on AWGN channels with different signal-to-noise ratios (SNRs). Experimental results show that the proposed methods outperform the traditional uniform quadrature amplitude modulation (QAM) constellation map** method by only adding a few additional parameters. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.03361 [pdf, other]

Hierarchical Similarity Learning for Aliasing Suppression Image Super-Resolution

Authors: Yuqing Liu, Qi Jia, Jian Zhang, Xin Fan, Shanshe Wang, Siwei Ma, Wen Gao

Abstract: As a highly ill-posed issue, single image super-resolution (SISR) has been widely investigated in recent years. The main task of SISR is to recover the information loss caused by the degradation procedure. According to the Nyquist sampling theory, the degradation leads to aliasing effect and makes it hard to restore the correct textures from low-resolution (LR) images. In practice, there are corre… ▽ More As a highly ill-posed issue, single image super-resolution (SISR) has been widely investigated in recent years. The main task of SISR is to recover the information loss caused by the degradation procedure. According to the Nyquist sampling theory, the degradation leads to aliasing effect and makes it hard to restore the correct textures from low-resolution (LR) images. In practice, there are correlations and self-similarities among the adjacent patches in the natural images. This paper considers the self-similarity and proposes a hierarchical image super-resolution network (HSRNet) to suppress the influence of aliasing. We consider the SISR issue in the optimization perspective, and propose an iterative solution pattern based on the half-quadratic splitting (HQS) method. To explore the texture with local image prior, we design a hierarchical exploration block (HEB) and progressive increase the receptive field. Furthermore, multi-level spatial attention (MSA) is devised to obtain the relations of adjacent feature and enhance the high-frequency information, which acts as a crucial role for visual experience. Experimental result shows HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2205.10146 [pdf, other]

Revisiting GANs by Best-Response Constraint: Perspective, Methodology, and Application

Authors: Risheng Liu, Jiaxin Gao, Xuan Liu, Xin Fan

Abstract: In past years, the minimax type single-level optimization formulation and its variations have been widely utilized to address Generative Adversarial Networks (GANs). Unfortunately, it has been proved that these alternating learning strategies cannot exactly reveal the intrinsic relationship between the generator and discriminator, thus easily result in a series of issues, including mode collapse,… ▽ More In past years, the minimax type single-level optimization formulation and its variations have been widely utilized to address Generative Adversarial Networks (GANs). Unfortunately, it has been proved that these alternating learning strategies cannot exactly reveal the intrinsic relationship between the generator and discriminator, thus easily result in a series of issues, including mode collapse, vanishing gradients and oscillations in the training phase, etc. In this work, by investigating the fundamental mechanism of GANs from the perspective of hierarchical optimization, we propose Best-Response Constraint (BRC), a general learning framework, that can explicitly formulate the potential dependency of the generator on the discriminator. Rather than adopting these existing time-consuming bilevel iterations, we design an implicit gradient scheme with outer-product Hessian approximation as our fast solution strategy. \emph{Noteworthy, we demonstrate that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.} Extensive quantitative and qualitative experimental results verify the effectiveness, flexibility and stability of our proposed framework. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: 11 pages

arXiv:2205.08579 [pdf, other]

The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation

Authors: Guowei Wu, Shipei Liu, Xiaoya Fan

Abstract: Symbolic Music Generation relies on the contextual representation capabilities of the generative model, where the most prevalent approach is the Transformer-based model. The learning of musical context is also related to the structural elements in music, i.e. intro, verse, and chorus, which are currently overlooked by the research community. In this paper, we propose a hierarchical Transformer mod… ▽ More Symbolic Music Generation relies on the contextual representation capabilities of the generative model, where the most prevalent approach is the Transformer-based model. The learning of musical context is also related to the structural elements in music, i.e. intro, verse, and chorus, which are currently overlooked by the research community. In this paper, we propose a hierarchical Transformer model to learn multi-scale contexts in music. In the encoding phase, we first designed a Fragment Scope Localization layer to syncopate the music into chords and sections. Then, we use a multi-scale attention mechanism to learn note-, chord-, and section-level contexts. In the decoding phase, we proposed a hierarchical Transformer model that uses fine-decoders to generate sections in parallel and a coarse-decoder to decode the combined music. We also designed a Music Style Normalization layer to achieve a consistent music style between the generated sections. Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary music generative models. More excitingly, the visual evaluation shows that our model is superior in melody reuse, resulting in more realistic music. △ Less

Submitted 10 July, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.07062 [pdf, other]

doi 10.1016/j.bspc.2023.104821

An Interpretable MRI Reconstruction Network with Two-grid-cycle Correction and Geometric Prior Distillation

Authors: Xiaohong Fan, Yin Yang, Ke Chen, Jian** Zhang, Ke Dong

Abstract: Although existing deep learning compressed-sensing-based Magnetic Resonance Imaging (CS-MRI) methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since the transition from mathematical analysis to network design not always natural enough, often most of them are not flexible enough to handle multi-sampling-ratio r… ▽ More Although existing deep learning compressed-sensing-based Magnetic Resonance Imaging (CS-MRI) methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since the transition from mathematical analysis to network design not always natural enough, often most of them are not flexible enough to handle multi-sampling-ratio reconstruction assignments. {In this work, to tackle explainability and generalizability, we propose a unifying deep unfolding multi-sampling-ratio interpretable CS-MRI framework.} The combined approach offers more generalizability than previous works whereas deep learning gains explainability through a geometric prior module. Inspired by the multigrid algorithm, we first embed the CS-MRI-based optimization algorithm into correction-distillation scheme that consists of three ingredients: pre-relaxation module, correction module and geometric prior distillation module. Furthermore, we employ a condition module to learn adaptively step-length and noise level, which enables the proposed framework to jointly train multi-ratio tasks through a single model. { The proposed model not only compensates for the lost contextual information of reconstructed image which is refined from low frequency error in geometric characteristic k-space}, but also integrates the theoretical guarantee of model-based methods and the superior reconstruction performances of deep learning-based methods. Therefore, it can give us a novel perspective to design biomedical imaging networks. { Numerical experiments show that our framework outperforms state-of-the-art methods in terms of qualitative and quantitative evaluations.} {Our method achieves 3.18 dB improvement at low CS ratio 10\% and average 1.42 dB improvement over other comparison methods on brain dataset using Cartesian sampling mask. △ Less

Submitted 5 March, 2023; v1 submitted 14 May, 2022; originally announced May 2022.

Comments: 14 pages, accepted to Biomedical Signal Processing and Control,March, 2023

Journal ref: Biomedical Signal Processing and Control, vol 84, 2023

arXiv:2204.13952 [pdf, other]

Deep Geometry Post-Processing for Decompressed Point Clouds

Authors: Xiaoqing Fan, Ge Li, Dingquan Li, Yurui Ren, Wei Gao, Thomas H. Li

Abstract: Point cloud compression plays a crucial role in reducing the huge cost of data storage and transmission. However, distortions can be introduced into the decompressed point clouds due to quantization. In this paper, we propose a novel learning-based post-processing method to enhance the decompressed point clouds. Specifically, a voxelized point cloud is first divided into small cubes. Then, a 3D co… ▽ More Point cloud compression plays a crucial role in reducing the huge cost of data storage and transmission. However, distortions can be introduced into the decompressed point clouds due to quantization. In this paper, we propose a novel learning-based post-processing method to enhance the decompressed point clouds. Specifically, a voxelized point cloud is first divided into small cubes. Then, a 3D convolutional network is proposed to predict the occupancy probability for each location of a cube. We leverage both local and global contexts by generating multi-scale probabilities. These probabilities are progressively summed to predict the results in a coarse-to-fine manner. Finally, we obtain the geometry-refined point clouds based on the predicted probabilities. Different from previous methods, we deal with decompressed point clouds with huge variety of distortions using a single model. Experimental results show that the proposed method can significantly improve the quality of the decompressed point clouds, achieving 9.30dB BDPSNR gain on three representative datasets on average. △ Less

Submitted 29 April, 2022; originally announced April 2022.

arXiv:2204.12039 [pdf, other]

Learning Weighting Map for Bit-Depth Expansion within a Rational Range

Authors: Yuqing Liu, Qi Jia, Jian Zhang, Xin Fan, Shanshe Wang, Siwei Ma, Wen Gao

Abstract: Bit-depth expansion (BDE) is one of the emerging technologies to display high bit-depth (HBD) image from low bit-depth (LBD) source. Existing BDE methods have no unified solution for various BDE situations, and directly learn a map** for each pixel from LBD image to the desired value in HBD image, which may change the given high-order bits and lead to a huge deviation from the ground truth. In t… ▽ More Bit-depth expansion (BDE) is one of the emerging technologies to display high bit-depth (HBD) image from low bit-depth (LBD) source. Existing BDE methods have no unified solution for various BDE situations, and directly learn a map** for each pixel from LBD image to the desired value in HBD image, which may change the given high-order bits and lead to a huge deviation from the ground truth. In this paper, we design a bit restoration network (BRNet) to learn a weight for each pixel, which indicates the ratio of the replenished value within a rational range, invoking an accurate solution without modifying the given high-order bit information. To make the network adaptive for any bit-depth degradation, we investigate the issue in an optimization perspective and train the network under progressive training strategy for better performance. Moreover, we employ Wasserstein distance as a visual quality indicator to evaluate the difference of color distribution between restored image and the ground truth. Experimental results show our method can restore colorful images with fewer artifacts and false contours, and outperforms state-of-the-art methods with higher PSNR/SSIM results and lower Wasserstein distance. The source code will be made available at https://github.com/yuqing-liu-dut/bit-depth-expansion △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2203.09070 [pdf, other]

Proactive Posturing of Large Power Grid for Mitigating Hurricane Impacts

Authors: Edward Quarm Jnr., Xiaoyuan Fan, Marcelo Elizondo, Ramtin Madani

Abstract: In the past decade, natural disasters such as hurricanes have challenged the operation and control of U.S. power grid. It is crucial to develop proactive strategies to assist grid operators for better emergency response and minimized electricity service interruptions; the better the grid may be preserved, the faster the grid can be restored. In this paper, we propose a proactive posturing of power… ▽ More In the past decade, natural disasters such as hurricanes have challenged the operation and control of U.S. power grid. It is crucial to develop proactive strategies to assist grid operators for better emergency response and minimized electricity service interruptions; the better the grid may be preserved, the faster the grid can be restored. In this paper, we propose a proactive posturing of power system elements, and formulate a Security-Constrained Optimal Power Flow (SCOPF) informed by cross-domain hurricane modeling as well as its potential impacts on grid elements. Simulation results based on real-world power grid and historical hurricane event have verified the applicability of the proposed optimization formulation, which shows potential to enable grid operators and planners with interactive cross-domain data analytics for mitigating hurricane impacts. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: 5 pages, 2 figures, 1 table

arXiv:2203.05148 [pdf, other]

Efficient Topology Assessment for Integrated Transmission and Distribution Network with 10,000+ Inverter-based Resources

Authors: Tao Fu, Dexin Wang, Xiaoyuan Fan, Huiying Ren, Jim Ogle, Yousu Chen

Abstract: The renewable energy proliferation calls upon the grid operators and planners to systematically evaluate the potential impacts of distributed energy resources (DERs). Considering the significant differences between various inverter-based resources (IBRs), especially the different capabilities between grid-forming inverters and grid-following inverters, it is crucial to develop an efficient and eff… ▽ More The renewable energy proliferation calls upon the grid operators and planners to systematically evaluate the potential impacts of distributed energy resources (DERs). Considering the significant differences between various inverter-based resources (IBRs), especially the different capabilities between grid-forming inverters and grid-following inverters, it is crucial to develop an efficient and effective assessment procedure besides available co-simulation framework with high computation burdens. This paper presents a streamlined graph-based topology assessment for the integrated power system transmission and distribution networks. Graph analyses were performed based on the integrated graph of modified miniWECC grid model and IEEE 8500-node test feeder model, high performance computing platform with 40 nodes and total 2400 CPUs has been utilized to process this integrated graph, which has 100,000+ nodes and 10,000+ IBRs. The node ranking results not only verified the applicability of the proposed method, but also revealed the potential of distributed grid forming (GFM) and grid following (GFL) inverters interacting with the centralized power plants. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 5 pages, 5 figures, 3 tables

arXiv:2203.04517 [pdf]

Recovery Time Metric Demonstrated on Real-world Electric Grid for Hurricane Impacted Outages

Authors: Patrick Maloney, Xiaoyuan Fan, Marcelo Elizondo, Xue Li

Abstract: This work proposes a methodology for estimating recovery times for transmission lines and substations, and is demonstrated on a real-world 1269-bus power system model of Puerto Rico under 20 hurricane scenarios, or stochastic realizations of asset failure under the meteorological conditions of Hurricane Maria. The method defines base recovery times for system components and identifies factors that… ▽ More This work proposes a methodology for estimating recovery times for transmission lines and substations, and is demonstrated on a real-world 1269-bus power system model of Puerto Rico under 20 hurricane scenarios, or stochastic realizations of asset failure under the meteorological conditions of Hurricane Maria. The method defines base recovery times for system components and identifies factors that impact these base values by means of multipliers. While the method is tested on transmission lines and substation failures due to hurricanes, it is based on a generic process that could be applied to any system component or event as a general recovery time estimation framework. The results show that given the two failure modes under study (transmission towers and substations), transmission towers appear to have a greater impact on recovery time estimates despite substations being given longer base outage times. Additionally, average recovery times for the simulated hurricanes across 20 scenarios is ~28,000 work crew days. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: 5 pages, 5 figures, 5 tables

arXiv:2201.01458 [pdf, other]

doi 10.1109/TCSVT.2021.3138431

Cross-SRN: Structure-Preserving Super-Resolution Network with Cross Convolution

Authors: Yuqing Liu, Qi Jia, Xin Fan, Shanshe Wang, Siwei Ma, Wen Gao

Abstract: It is challenging to restore low-resolution (LR) images to super-resolution (SR) images with correct and clear details. Existing deep learning works almost neglect the inherent structural information of images, which acts as an important role for visual perception of SR results. In this paper, we design a hierarchical feature exploitation network to probe and preserve structural information in a m… ▽ More It is challenging to restore low-resolution (LR) images to super-resolution (SR) images with correct and clear details. Existing deep learning works almost neglect the inherent structural information of images, which acts as an important role for visual perception of SR results. In this paper, we design a hierarchical feature exploitation network to probe and preserve structural information in a multi-scale feature fusion manner. First, we propose a cross convolution upon traditional edge detectors to localize and represent edge features. Then, cross convolution blocks (CCBs) are designed with feature normalization and channel attention to consider the inherent correlations of features. Finally, we leverage multi-scale feature fusion group (MFFG) to embed the cross convolution blocks and develop the relations of structural features in different scales hierarchically, invoking a lightweight structure-preserving network named as Cross-SRN. Experimental results demonstrate the Cross-SRN achieves competitive or superior restoration performances against the state-of-the-art methods with accurate and clear structural details. Moreover, we set a criterion to select images with rich structural textures. The proposed Cross-SRN outperforms the state-of-the-art methods on the selected benchmark, which demonstrates that our network has a significant advantage in preserving edges. △ Less

Submitted 7 January, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

arXiv:2112.05147 [pdf, other]

Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement

Authors: Long Ma, Risheng Liu, Jiaao Zhang, Xin Fan, Zhongxuan Luo

Abstract: Enhancing the quality of low-light images plays a very important role in many image processing and multimedia applications. In recent years, a variety of deep learning techniques have been developed to address this challenging task. A typical framework is to simultaneously estimate the illumination and reflectance, but they disregard the scene-level contextual information encapsulated in feature s… ▽ More Enhancing the quality of low-light images plays a very important role in many image processing and multimedia applications. In recent years, a variety of deep learning techniques have been developed to address this challenging task. A typical framework is to simultaneously estimate the illumination and reflectance, but they disregard the scene-level contextual information encapsulated in feature spaces, causing many unfavorable outcomes, e.g., details loss, color unsaturation, artifacts, and so on. To address these issues, we develop a new context-sensitive decomposition network architecture to exploit the scene-level contextual dependencies on spatial scales. More concretely, we build a two-stream estimation mechanism including reflectance and illumination estimation network. We design a novel context-sensitive decomposition connection to bridge the two-stream mechanism by incorporating the physical principle. The spatially-varying illumination guidance is further constructed for achieving the edge-aware smoothness property of the illumination component. According to different training patterns, we construct CSDNet (paired supervision) and CSDGAN (unpaired supervision) to fully evaluate our designed architecture. We test our method on seven testing benchmarks to conduct plenty of analytical and evaluated experiments. Thanks to our designed context-sensitive decomposition connection, we successfully realized excellent enhanced results, which fully indicates our superiority against existing state-of-the-art approaches. Finally, considering the practical needs for high-efficiency, we develop a lightweight CSDNet (named LiteCSDNet) by reducing the number of channels. Further, by sharing an encoder for these two components, we obtain a more lightweight version (SLiteCSDNet for short). SLiteCSDNet just contains 0.0301M parameters but achieves the almost same performance as CSDNet. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted by IEEE TNNLS. Code is available at https://github.com/KarelZhang/CSDNet-CSDGAN

arXiv:2112.00216 [pdf, other]

PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

Authors: Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park

Abstract: Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are ma… ▽ More Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person. The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer function called pose kernel -- the impulse response of audio signals induced by the body pose. The main properties of the pose kernel are that (1) its envelope highly correlates with 3D pose, (2) the time response corresponds to arrival time, indicating the metric distance to the microphone, and (3) it is invariant to changes in the scene geometry configurations. Therefore, it is readily generalizable to unseen scenes. We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale. We show that our multi-modal method produces accurate metric reconstruction in real world scenes, which is not possible with state-of-the-art lifting approaches including parametric mesh regression and depth regression. △ Less

Submitted 2 December, 2021; v1 submitted 30 November, 2021; originally announced December 2021.

arXiv:2111.04459 [pdf, other]

doi 10.1109/TIP.2021.3128327

Triple-level Model Inferred Collaborative Network Architecture for Video Deraining

Authors: Pan Mu, Zhu Liu, Yaohua Liu, Risheng Liu, Xin Fan

Abstract: Video deraining is an important issue for outdoor vision systems and has been investigated extensively. However, designing optimal architectures by the aggregating model formation and data distribution is a challenging task for video deraining. In this paper, we develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searchin… ▽ More Video deraining is an important issue for outdoor vision systems and has been investigated extensively. However, designing optimal architectures by the aggregating model formation and data distribution is a challenging task for video deraining. In this paper, we develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism, named Triple-level Model Inferred Cooperating Searching (TMICS), for dealing with various video rain circumstances. In particular, to mitigate the problem that existing methods cannot cover various rain streaks distribution, we first design a hyper-parameter optimization model about task variable and hyper-parameter. Based on the proposed optimization model, we design a collaborative structure for video deraining. This structure includes Dominant Network Architecture (DNA) and Companionate Network Architecture (CNA) that is cooperated by introducing an Attention-based Averaging Scheme (AAS). To better explore inter-frame information from videos, we introduce a macroscopic structure searching scheme that searches from Optical Flow Module (OFM) and Temporal Grou** Module (TGM) to help restore latent frame. In addition, we apply the differentiable neural architecture searching from a compact candidate set of task-specific operations to discover desirable rain streaks removal architectures automatically. Extensive experiments on various datasets demonstrate that our model shows significant improvements in fidelity and temporal consistency over the state-of-the-art works. Source code is available at https://github.com/vis-opt-group/TMICS. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: Accepted at IEEE Transactions on Image Processing

arXiv:2110.05085 [pdf, other]

Efficiently and Globally Solving Joint Beamforming and Compression Problem in the Cooperative Cellular Network via Lagrangian Duality

Authors: Xilai Fan, Ya-Feng Liu, Liang Liu

Abstract: Consider the joint beamforming and quantization problem in the cooperative cellular network, where multiple relay-like base stations (BSs) connected to the central processor (CP) via rate-limited fronthaul links cooperatively serve the users. This problem can be formulated as the minimization of the total transmit power, subject to all users' signal-to-interference-plus-noise-ratio (SINR) constrai… ▽ More Consider the joint beamforming and quantization problem in the cooperative cellular network, where multiple relay-like base stations (BSs) connected to the central processor (CP) via rate-limited fronthaul links cooperatively serve the users. This problem can be formulated as the minimization of the total transmit power, subject to all users' signal-to-interference-plus-noise-ratio (SINR) constraints and all relay-like BSs' fronthaul rate constraints. In this paper, we first show that there is no duality gap between the considered problem and its Lagrangian dual by showing the tightness of the semidefinite relaxation (SDR) of the considered problem. Then we propose an efficient algorithm based on Lagrangian duality for solving the considered problem. The proposed algorithm judiciously exploits the special structure of the Karush-Kuhn-Tucker (KKT) conditions of the considered problem and finds the solution that satisfies the KKT conditions via two fixed-point iterations. The proposed algorithm is highly efficient (as evaluating the functions in both fixed-point iterations are computationally cheap) and is guaranteed to find the global solution of the problem. Simulation results show the efficiency and the correctness of the proposed algorithm. △ Less

Submitted 10 February, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 5 pages, 1 figure, accepted for publication in IEEE ICASSP 2022

arXiv:2109.09813 [pdf, other]

A Data-Driven Democratized Control Architecture for Regional Transmission Operators

Authors: Xiaoyuan Fan, Daniel Moscovitz, Liang Du, Walid Saad

Abstract: As probably the most complicated and critical infrastructure system, U.S. power grids become increasingly vulnerable to extreme events such as cyber-attacks and severe weather, as well as higher DER penetrations and growing information mismatch among system operators, utilities (transmission or generation owners), and end-users. This paper proposes a data-driven democratized control architecture c… ▽ More As probably the most complicated and critical infrastructure system, U.S. power grids become increasingly vulnerable to extreme events such as cyber-attacks and severe weather, as well as higher DER penetrations and growing information mismatch among system operators, utilities (transmission or generation owners), and end-users. This paper proposes a data-driven democratized control architecture considering two democratization pathways to assist transmission system operators, with a targeted use case of develo** online proactive islanding strategies. Detailed discussions on load capability profiling at transmission buses and disaggregation of DER generations are provided and illustrated with real-world utility data. By Combining network and operational constraints, transmission system operators can be equipped with new tools built on top of this architecture, to derive accurate, proactive, and strategic islanding decisions to incorporate the wide range of dynamic portfolios and needs when facing extreme events or unseen grid contingencies. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 5 pages, 3 figures, conference

arXiv:2107.07658 [pdf]

Retention time trajectory matching for target compound peak identification in chromatographic analysis

Authors: Wenzhe Zang, Ruchi Sharma, Maxwell Wei-Hao Li, Xudong Fan

Abstract: Retention time drift caused by fluctuations in physical factors such as temperature ram** rate and carrier gas flow rate is ubiquitous in chromatographic measurements. Proper peak identification and alignment across different chromatograms is critical prior to any subsequent analysis. This work introduces a peak identification method called retention time trajectory (RTT) matching, which uses ch… ▽ More Retention time drift caused by fluctuations in physical factors such as temperature ram** rate and carrier gas flow rate is ubiquitous in chromatographic measurements. Proper peak identification and alignment across different chromatograms is critical prior to any subsequent analysis. This work introduces a peak identification method called retention time trajectory (RTT) matching, which uses chromatographic retention times as the only input and identifies peaks associated with any subset of a predefined set of target compounds. RTT matching is also capable of reporting interferents. An RTT is a 2-dimensional (2D) curve formed uniquely by the retention times of the chromatographic peaks. The RTTs obtained from the chromatogram of a test sample and of pre-characterized library are matched and statistically compared. The best matched pair implies identification. Unlike most existing peak alignment methods, no mathematical war** or transformations are involved. Based on the experimentally characterized RTT, an RTT hybridization method is developed to rapidly generate more RTTs without performing actual time-consuming chromatographic measurements. This enables successful identification even for chromatograms with serious retention time drift. Experimentally obtained gas chromatograms and publicly available fruit metabolomics liquid chromatograms are used to generate over two trillions of tests that validate the proposed method, demonstrating real-time peak/interferent identification. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2107.05160 [pdf, other]

Spatial and Temporal Networks for Facial Expression Recognition in the Wild Videos

Authors: Shuyi Mao, Xinqi Fan, Xiaojiang Peng

Abstract: The paper describes our proposed methodology for the seven basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2021. In this task, facial expression recognition (FER) methods aim to classify the correct expression category from a diverse background, but there are several challenges. First, to adapt the model to in-the-wild scenarios, we use the knowl… ▽ More The paper describes our proposed methodology for the seven basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2021. In this task, facial expression recognition (FER) methods aim to classify the correct expression category from a diverse background, but there are several challenges. First, to adapt the model to in-the-wild scenarios, we use the knowledge from pre-trained large-scale face recognition data. Second, we propose an ensemble model with a convolution neural network (CNN), a CNN-recurrent neural network (CNN-RNN), and a CNN-Transformer (CNN-Transformer), to incorporate both spatial and temporal information. Our ensemble model achieved F1 as 0.4133, accuracy as 0.6216 and final metric as 0.4821 on the validation set. △ Less

Submitted 11 July, 2021; originally announced July 2021.

arXiv:2107.04943 [pdf, other]

doi 10.1109/BHI50953.2021.9508565

Deep Geometric Distillation Network for Compressive Sensing MRI

Authors: Xiaohong Fan, Yin Yang, Jian** Zhang

Abstract: Compressed sensing (CS) is an efficient method to reconstruct MR image from small sampled data in $k$-space and accelerate the acquisition of MRI. In this work, we propose a novel deep geometric distillation network which combines the merits of model-based and deep learning-based CS-MRI methods, it can be theoretically guaranteed to improve geometric texture details of a linear reconstruction. Fir… ▽ More Compressed sensing (CS) is an efficient method to reconstruct MR image from small sampled data in $k$-space and accelerate the acquisition of MRI. In this work, we propose a novel deep geometric distillation network which combines the merits of model-based and deep learning-based CS-MRI methods, it can be theoretically guaranteed to improve geometric texture details of a linear reconstruction. Firstly, we unfold the model-based CS-MRI optimization problem into two sub-problems that consist of image linear approximation and image geometric compensation. Secondly, geometric compensation sub-problem for distilling lost texture details in approximation stage can be expanded by Taylor expansion to design a geometric distillation module fusing features of different geometric characteristic domains. Additionally, we use a learnable version with adaptive initialization of the step-length parameter, which allows model more flexibility that can lead to convergent smoothly. Numerical experiments verify its superiority over other state-of-the-art CS-MRI reconstruction approaches. The source code will be available at \url{https://github.com/fanxiaohong/Deep-Geometric-Distillation-Network-for-CS-MRI} △ Less

Submitted 27 August, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

Comments: Accepted by IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2021

MSC Class: 68T05; 68T20; 68T09; 68W25 ACM Class: F.2.2; I.2.7

arXiv:2103.10511 [pdf, other]

EMS and DMS Integration of the Coordinative Real-time Sub-Transmission Volt-Var Control Tool under High DER Penetration

Authors: Quan Nguyen, Jim Ogle, Xiaoyuan Fan, Xinda Ke, Mallikarjuna R. Vallem, Nader Samaan, Ning Lu

Abstract: This paper proposes an applicable approach to deploy the Coordinative Real-time Sub-Transmission Volt-Var Control Tool (CReST-VCT), and a holistic system integration framework considering both the energy management system (EMS) and distribution system management system (DMS). This provides an architectural basis and can serve as the implementation guideline of CReST-VCT and other advanced grid sup… ▽ More This paper proposes an applicable approach to deploy the Coordinative Real-time Sub-Transmission Volt-Var Control Tool (CReST-VCT), and a holistic system integration framework considering both the energy management system (EMS) and distribution system management system (DMS). This provides an architectural basis and can serve as the implementation guideline of CReST-VCT and other advanced grid support tools, to co-optimize the operation benefits of distributed energy resources (DERs) and assets in both transmission and distribution networks. Potential communication protocols for different physical domains of a real application is included. Performance and security issues are also discussed, along with specific considerations for field deployment. Finally, the paper presents a viable pathway for CReST-VCT and other advanced grid support tools to be integrated in an open-source standardized-based platform that supports distribution utilities. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: 5 pages, 3 figures, conference

arXiv:2103.08667 [pdf]

Model Validation Study for Central American Regional Electrical Interconnected System

Authors: Xiaoyuan Fan, Marcelo A. Elizondo, Pavel V. Etingov, Mallikarjuna R. Vallem, Shuchismita Biswas, Seemita Pal, Carlos Erroa, Christian Munoz, Daniel Polanco, Victor Villeda

Abstract: The Central American Regional Interconnected Power System (SER) connects six countries: Guatemala, El Salvador, Honduras, Nicaragua, Costa Rica, and Panama, it is operated by the regional system operator Ente Operador Regional (EOR). Due to its geographical shape and layout of major transmission lines, SER has a weakly meshed grid, where disturbances can easily propagate, challenging its reliabili… ▽ More The Central American Regional Interconnected Power System (SER) connects six countries: Guatemala, El Salvador, Honduras, Nicaragua, Costa Rica, and Panama, it is operated by the regional system operator Ente Operador Regional (EOR). Due to its geographical shape and layout of major transmission lines, SER has a weakly meshed grid, where disturbances can easily propagate, challenging its reliability. Having an accurate dynamic model is important for EOR when facing those reliability challenges. This paper describes interconnection-level model validation efforts for the SER and Mexico interconnected system. A detailed equivalent model of Mexico is incorporated in the existing SER planning model used by EOR. The resultant model is then validated using simulated dynamic contingency analysis and real system disturbance data. A fully automated suite of scripts is also developed and shared with EOR engineers. This work helps EOR improve their validation routine practices, to continuously improve SER dynamic model, and hence its reliability. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 5 pages, 6 figures, conference

arXiv:2102.00202 [pdf, other]

SNR-adaptive deep joint source-channel coding for wireless image transmission

Authors: Mingze Ding, Jiahui Li, Mengyao Ma, Xiaopeng Fan

Abstract: Considering the problem of joint source-channel coding (JSCC) for multi-user transmission of images over noisy channels, an autoencoder-based novel deep joint source-channel coding scheme is proposed in this paper. In the proposed JSCC scheme, the decoder can estimate the signal-to-noise ratio (SNR) and use it to adaptively decode the transmitted image. Experiments demonstrate that the proposed sc… ▽ More Considering the problem of joint source-channel coding (JSCC) for multi-user transmission of images over noisy channels, an autoencoder-based novel deep joint source-channel coding scheme is proposed in this paper. In the proposed JSCC scheme, the decoder can estimate the signal-to-noise ratio (SNR) and use it to adaptively decode the transmitted image. Experiments demonstrate that the proposed scheme achieves impressive results in adaptability for different SNRs and is robust to the decoder's estimation error of the SNR. To the best of our knowledge, this is the first deep JSCC scheme that focuses on the adaptability for different SNRs and can be applied to multi-user scenarios. △ Less

Submitted 2 February, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

Comments: Accepted in IEEE ICASSP 2021

Showing 1–50 of 62 results for author: Fan, X