Search | arXiv e-print repository

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Map** Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.15448 [pdf, other]

What is Wrong with End-to-End Learning for Phase Retrieval?

Authors: Wenjie Zhang, Yuxiang Wan, Zhong Zhuang, Ju Sun

Abstract: For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before… ▽ More For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before any learning, i.e., symmetry breaking. We take far-field phase retrieval (FFPR), which is central to many areas of scientific imaging, as an example and show that symmetric breaking can substantially improve data-driven learning. We also formulate the mathematical principle of symmetry breaking. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2312.15721 [pdf, ps, other]

UAV Trajectory Tracking via RNN-enhanced IMM-KF with ADS-B Data

Authors: Yian Zhu, Ziye Jia, Qihui Wu, Chao Dong, Zirui Zhuang, Huiling Hu, Qi Cai

Abstract: With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic de… ▽ More With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic dependent surveillance-broadcast (ADS-B) emerges as a promising method to monitor UAVs, due to the advantages of real-time capabilities, easy deployment and affordable cost. Therefore, we employ the ADS-B for UAV trajectory tracking in this work. However, the inherent noise in the transmitted data poses an obstacle for precisely tracking UAVs. Hence, we propose the algorithm of recurrent neural network-enhanced interacting multiple model-Kalman filter (RNN-enhanced IMM-KF) for UAV trajectory filtering. Specifically, the algorithm utilizes the RNN to capture the maneuvering behavior of UAVs and the noise level in the ADS-B data. Moreover, accurate UAV tracking is achieved by adaptively adjusting the process noise matrix and observation noise matrix of IMM-KF with the assistance of the RNN. The proposed algorithm can facilitate GSs to make timely decisions during trajectory deviations of UAVs and improve the airspace safety. Finally, via comprehensive simulations, the total root mean square error of the proposed algorithm decreases by 28.56%, compared to the traditional IMM-KF. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.00951 [pdf, other]

AV4EV: Open-Source Modular Autonomous Electric Vehicle Platform for Making Mobility Research Accessible

Authors: Zhijie Qiao, Mingyan Zhou, Zhijun Zhuang, Tejas Agarwal, Felix Jahncke, Po-Jen Wang, Jason Friedman, Hongyi Lai, Divyanshu Sahu, Tomáš Nagy, Martin Endler, Jason Schlessman, Rahul Mangharam

Abstract: When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated fac… ▽ More When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated facilities. On the other hand, 1/10th-1/16th scaled-down vehicle platforms are more affordable but have limited similitude in performance and drivability. To address this issue, we present the design of a one-third-scale autonomous electric go-kart platform with open-source mechatronics design along with fully functional autonomous driving software. The platform's multi-modal driving system is capable of manual, autonomous, and teleoperation driving modes. It also features a flexible sensing suite for the algorithm deployment across perception, localization, planning, and control. This development serves as a bridge between full-scale vehicles and reduced-scale cars while accelerating cost-effective algorithmic advancements. Our experimental results demonstrate the AV4EV platform's capabilities and ease of use for develo** new AV algorithms. All materials are available at AV4EV.org to stimulate collaborative efforts within the AV and electric vehicle (EV) communities. △ Less

Submitted 12 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 6 pages, 5 figures

arXiv:2309.07152 [pdf]

Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the current facial respirators do not inform the user of the air quality within the smart facepiece respirator in case of continuous long-term use. Here, we demonstrate the novel smart N-95 filtering facepiece respirator that incorporates the humidity sensor and pressure sensory feedback-enabled self-fit adjusting functionality for the effective performance of the facial respirator to prevent the transmission of airborne pathogens. The laser-induced graphene (LIG) constitutes the humidity sensor, and the pressure sensor array based on the dielectric elastomeric sponge monitors the respirator contact on the face of the user, providing the sensory information for a closed-loop feedback mechanism. As a result of the self-fit adjusting mode along with elastomeric lining, the fit factor is increased by 3.20 and 5 times at average and maximum respectively. We expect that the experimental proof-of-concept of this work will offer viable solutions to the current commercial respirators to address the limitations. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

MSC Class: 92C55

arXiv:2307.10316 [pdf, other]

CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation

Authors: Lizhao Liu, Zhuangwei Zhuang, Shangxin Huang, Xunlong Xiao, Tianhang Xiang, Cen Chen, **gdong Wang, Mingkui Tan

Abstract: We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked m… ▽ More We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked modeling (e.g., MAE) in image and video representation learning, we seek to endow the power of masked modeling to learn contextual information from sparsely-annotated points. However, directly applying MAE to 3D point clouds with sparse annotations may fail to work. First, it is nontrivial to effectively mask out the informative visual context from 3D point clouds. Second, how to fully exploit the sparse annotations for context modeling remains an open question. In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. Specifically, RegionMask masks the point cloud continuously in geometric space to construct a meaningful masked prediction task for subsequent context learning. CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively. Extensive experiments on the widely-tested ScanNet V2 and S3DIS benchmarks demonstrate the superiority of CPCM over the state-of-the-art. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2306.12098 [pdf, other]

MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks for 12-Lead ECG Classification

Authors: Renjie Cheng, Zhemin Zhuang, Shuxin Zhuang, Lei Xie, **gfeng Guo

Abstract: Automatic classification of electrocardiogram (ECG) signals plays a crucial role in the early prevention and diagnosis of cardiovascular diseases. While ECG signals can be used for the diagnosis of various diseases, their pathological characteristics exhibit minimal variations, posing a challenge to automatic classification models. Existing methods primarily utilize convolutional neural networks t… ▽ More Automatic classification of electrocardiogram (ECG) signals plays a crucial role in the early prevention and diagnosis of cardiovascular diseases. While ECG signals can be used for the diagnosis of various diseases, their pathological characteristics exhibit minimal variations, posing a challenge to automatic classification models. Existing methods primarily utilize convolutional neural networks to extract ECG signal features for classification, which may not fully capture the pathological feature differences of different diseases. Transformer networks have advantages in feature extraction for sequence data, but the complete network is complex and relies on large-scale datasets. To address these challenges, we propose a single-layer Transformer network called Multi-Scale Shifted Windows Transformer Networks (MSW-Transformer), which uses a multi-window sliding attention mechanism at different scales to capture features in different dimensions. The self-attention is restricted to non-overlap** local windows via shifted windows, and different window scales have different receptive fields. A learnable feature fusion method is then proposed to integrate features from different windows to further enhance model performance. Furthermore, we visualize the attention mechanism of the multi-window shifted mechanism to achieve better clinical interpretation in the ECG classification task. The proposed model achieves state-of-the-art performance on five classification tasks of the PTBXL-2020 12-lead ECG dataset, which includes 5 diagnostic superclasses, 23 diagnostic subclasses, 12 rhythm classes, 17 morphology classes, and 44 diagnosis classes, with average macro-F1 scores of 77.85%, 47.57%, 66.13%, 34.60%, and 34.29%, and average sample-F1 scores of 81.26%, 68.27%, 91.32%, 50.07%, and 63.19%, respectively. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2305.19931 [pdf, other]

Asymptotic Performance Analysis of Large-Scale Active IRS-Aided Wireless Network

Authors: Yan Wang, Feng Shu, Zhihong Zhuang, Rongen Dong, Qi Zhang, Di Wu, Liang Yang, Jiangzhou Wang

Abstract: In this paper, the dominant factor affecting the performance of active intelligent reflecting surface (IRS) aided wireless communication networks in Rayleigh fading channel, namely the average signal-to-noise ratio (SNR) $γ_0$ at IRS, is studied. Making use of the weak law of large numbers, its simple asymptotic expression is derived as the number $N$ of IRS elements goes to medium-scale and large… ▽ More In this paper, the dominant factor affecting the performance of active intelligent reflecting surface (IRS) aided wireless communication networks in Rayleigh fading channel, namely the average signal-to-noise ratio (SNR) $γ_0$ at IRS, is studied. Making use of the weak law of large numbers, its simple asymptotic expression is derived as the number $N$ of IRS elements goes to medium-scale and large-scale. When $N$ tends to large-scale, the asymptotic received SNR at user is proved to be a linear increasing function of a product of $γ_0$ and $N$. Subsequently, when the BS transmit power is fixed, there exists an optimal limited reflective power at IRS. At this point, more IRS reflect power will degrade the SNR performance. Additionally, under the total power sum constraint of the BS transmit power and the power reflected by the IRS, an optimal power allocation (PA) strategy is derived and shown to achieve 0.83 bit rate gain over equal PA. Finally, an IRS with finite phase shifters being taken into account, generates phase quantization errors, and further leads to a degradation of receive performance. The corresponding closed-form performance loss expressions for user's asymptotic SNR, achievable rate (AR), and bit error rate (BER) are derived for active IRS. Numerical simulation results show that a 3-bit discrete phase shifter is required to achieve a trivial performance loss for a large-scale active IRS. △ Less

Submitted 5 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2211.00799 [pdf, other]

Practical Phase Retrieval Using Double Deep Image Priors

Authors: Zhong Zhuang, David Yang, Felix Hofmann, David Barmherzig, Ju Sun

Abstract: Phase retrieval (PR) concerns the recovery of complex phases from complex magnitudes. We identify the connection between the difficulty level and the number and variety of symmetries in PR problems. We focus on the most difficult far-field PR (FFPR), and propose a novel method using double deep image priors. In realistic evaluation, our method outperforms all competing methods by large margins. As… ▽ More Phase retrieval (PR) concerns the recovery of complex phases from complex magnitudes. We identify the connection between the difficulty level and the number and variety of symmetries in PR problems. We focus on the most difficult far-field PR (FFPR), and propose a novel method using double deep image priors. In realistic evaluation, our method outperforms all competing methods by large margins. As a single-instance method, our method requires no training data and minimal hyperparameter tuning, and hence enjoys good practicality. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2208.09483 [pdf, other]

doi 10.1007/s11263-023-01883-x

Blind Image Deblurring with Unknown Kernel Size and Substantial Noise

Authors: Zhong Zhuang, Taihui Li, Hengkang Wang, Ju Sun

Abstract: Blind image deblurring (BID) has been extensively studied in computer vision and adjacent fields. Modern methods for BID can be grouped into two categories: single-instance methods that deal with individual instances using statistical inference and numerical optimization, and data-driven methods that train deep-learning models to deblur future instances directly. Data-driven methods can be free fr… ▽ More Blind image deblurring (BID) has been extensively studied in computer vision and adjacent fields. Modern methods for BID can be grouped into two categories: single-instance methods that deal with individual instances using statistical inference and numerical optimization, and data-driven methods that train deep-learning models to deblur future instances directly. Data-driven methods can be free from the difficulty in deriving accurate blur models, but are fundamentally limited by the diversity and quality of the training data -- collecting sufficiently expressive and realistic training data is a standing challenge. In this paper, we focus on single-instance methods that remain competitive and indispensable. However, most such methods do not prescribe how to deal with unknown kernel size and substantial noise, precluding practical deployment. Indeed, we show that several state-of-the-art (SOTA) single-instance methods are unstable when the kernel size is overspecified, and/or the noise level is high. On the positive side, we propose a practical BID method that is stable against both, the first of its kind. Our method builds on the recent ideas of solving inverse problems by integrating the physical models and structured deep neural networks, without extra training data. We introduce several crucial modifications to achieve the desired stability. Extensive empirical tests on standard synthetic datasets, as well as real-world NTIRE2020 and RealBlur datasets, show the superior effectiveness and practicality of our BID method compared to SOTA single-instance as well as data-driven methods. The code of our method is available at: \url{https://github.com/sun-umn/Blind-Image-Deblurring}. △ Less

Submitted 15 September, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Journal ref: International Journal of Computer Vision, 2023

arXiv:2208.01227 [pdf, ps, other]

Optimal Measurement of Drone Swarm in RSS-based Passive Localization with Region Constraints

Authors: Xin Cheng, Feng Shu, Yifan Li, Zhihong Zhuang, Di Wu, Jiangzhou Wang

Abstract: Passive geolocation by multiple unmanned aerial vehicles (UAVs) covers a wide range of military and civilian applications including rescue, wild life tracking and electronic warfare. The sensor-target geometry is known to significantly affect the localization precision. The existing sensor placement strategies mainly work on the cases without any constraints on the sensors locations. However, UAVs… ▽ More Passive geolocation by multiple unmanned aerial vehicles (UAVs) covers a wide range of military and civilian applications including rescue, wild life tracking and electronic warfare. The sensor-target geometry is known to significantly affect the localization precision. The existing sensor placement strategies mainly work on the cases without any constraints on the sensors locations. However, UAVs cannot fly/hover simply in arbitrary region due to realistic constraints, such as the geographical limitations, the security issues, and the max flying speed. In this paper, optimal geometrical configurations of UAVs in received signal strength (RSS)-based localization under region constraints are investigated. Employing the D-optimal criteria, i.e., minimizing the determinate of Fisher information matrix (FIM), such optimal problem is formulated. Based on the rigorous algebra and geometrical derivations, optimal and also closed form configurations of UAVs under different flying states are proposed. Finally, the effectiveness and practicality of the proposed configurations are demonstrated by simulation examples. △ Less

Submitted 7 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2205.11346 [pdf, other]

Spatial Attention-based Implicit Neural Representation for Arbitrary Reduction of MRI Slice Spacing

Authors: Xin Wang, Sheng Wang, Honglin Xiong, Kai Xuan, Zixu Zhuang, Mengjun Liu, Zhenrong Shen, Xiangyu Zhao, Lichi Zhang, Qian Wang

Abstract: Magnetic resonance (MR) images collected in 2D clinical protocols typically have large inter-slice spacing, resulting in high in-plane resolution and reduced through-plane resolution. Super-resolution technique can enhance the through-plane resolution of MR images to facilitate downstream visualization and computer-aided diagnosis. However, most existing works train the super-resolution network at… ▽ More Magnetic resonance (MR) images collected in 2D clinical protocols typically have large inter-slice spacing, resulting in high in-plane resolution and reduced through-plane resolution. Super-resolution technique can enhance the through-plane resolution of MR images to facilitate downstream visualization and computer-aided diagnosis. However, most existing works train the super-resolution network at a fixed scaling factor, which is not friendly to clinical scenes of varying inter-slice spacing in MR scanning. Inspired by the recent progress in implicit neural representation, we propose a Spatial Attention-based Implicit Neural Representation (SA-INR) network for arbitrary reduction of MR inter-slice spacing. The SA-INR aims to represent an MR image as a continuous implicit function of 3D coordinates. In this way, the SA-INR can reconstruct the MR image with arbitrary inter-slice spacing by continuously sampling the coordinates in 3D space. In particular, a local-aware spatial attention operation is introduced to model nearby voxels and their affinity more accurately in a larger receptive field. Meanwhile, to improve the computational efficiency, a gradient-guided gating mask is proposed for applying the local-aware spatial attention to selected areas only. We evaluate our method on the public HCP-1200 dataset and the clinical knee MR dataset to demonstrate its superiority over other existing methods. △ Less

Submitted 19 March, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

arXiv:2204.09411 [pdf, ps, other]

Two Low-complexity DOA Estimators for Massive/Ultra-massive MIMO Receive Array

Authors: Yiwen Chen, Xichao Zhan, Feng Shu, Qijuan Jie, Xin Cheng, Zhihong Zhuang, Jiangzhou Wang

Abstract: Eigen-decomposition-based direction finding methods of using large-scale/ultra-large-scale fully-digital receive antenna arrays lead to a high or ultra-high complexity. To address the complexity dilemma, in this paper, three low-complexity estimators are proposed: partitioned subarray auto-correlation combining (PSAC), partitioned subarray cross-correlation combining (PSCC) and power iteration max… ▽ More Eigen-decomposition-based direction finding methods of using large-scale/ultra-large-scale fully-digital receive antenna arrays lead to a high or ultra-high complexity. To address the complexity dilemma, in this paper, three low-complexity estimators are proposed: partitioned subarray auto-correlation combining (PSAC), partitioned subarray cross-correlation combining (PSCC) and power iteration max correlation successive convex approximation (PI-Max-CSCA). Compared with the conventional no-partitioned direction finding method like root multiple signal classification (Root-MUSIC), in the PSAC method, the total set of antennas are equally partitioned into subsets of antennas, called subarrays, each subarray performs independent DOA estimation, and all DOA estimates are coherently combined to give the final estimation. For a better performance, the cross-correlation among sub-arrays is further exploited in the PSCC method to achieve the near-Cramer-Rao lower bound (CRLB) performance with the help of auto-correlation. To further reduce the complexity, in the PI-Max-CSCA method, using a fraction of all subarrays to make an initial coarse direction measurement (ICDM), the power iterative method is adopted to compute the more precise steering vector (SV) by exploiting the total array, and a more accurate DOA value is found using ICDM and SV through the maximum correlation method solved by successive convex approximation. Simulation results show that as the number of antennas goes to large-scale, the proposed three methods can achieve a dramatic complexity reduction over conventional Root-MUISC. Particularly, the PSCC and PI-Max-CSCA can reach the CRLB while the PSAC shows a substantial performance loss. △ Less

Submitted 10 August, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2201.04318 [pdf, other]

Knee Cartilage Defect Assessment by Graph Representation and Surface Convolution

Authors: Zixu Zhuang, Li** Si, Sheng Wang, Kai Xuan, Xi Ouyang, Yiqiang Zhan, Zhong Xue, Lichi Zhang, Dinggang Shen, Weiwu Yao, Qian Wang

Abstract: Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect as… ▽ More Knee osteoarthritis (OA) is the most common osteoarthritis and a leading cause of disability. Cartilage defects are regarded as major manifestations of knee OA, which are visible by magnetic resonance imaging (MRI). Thus early detection and assessment for knee cartilage defects are important for protecting patients from knee OA. In this way, many attempts have been made on knee cartilage defect assessment by applying convolutional neural networks (CNNs) to knee MRI. However, the physiologic characteristics of the cartilage may hinder such efforts: the cartilage is a thin curved layer, implying that only a small portion of voxels in knee MRI can contribute to the cartilage defect assessment; heterogeneous scanning protocols further challenge the feasibility of the CNNs in clinical practice; the CNN-based knee cartilage evaluation results lack interpretability. To address these challenges, we model the cartilages structure and appearance from knee MRI into a graph representation, which is capable of handling highly diverse clinical data. Then, guided by the cartilage graph representation, we design a non-Euclidean deep learning network with the self-attention mechanism, to extract cartilage features in the local and global, and to derive the final assessment with a visualized result. Our comprehensive experiments show that the proposed method yields superior performance in knee cartilage defect assessment, plus its convenient 3D visualization for interpretability. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 10 pages, 4 figures

arXiv:2112.06074 [pdf, other]

Early Stop** for Deep Image Prior

Authors: Hengkang Wang, Taihui Li, Zhong Zhuang, Tiancong Chen, Hengyue Liang, Ju Sun

Abstract: Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the… ▽ More Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the practicality of DIP often depends critically on good early stop** (ES) that captures the transition period. In this regard, the majority of DIP works for vision tasks only demonstrates the potential of the models -- reporting the peak performance against the ground truth, but provides no clue about how to operationally obtain near-peak performance without access to the groundtruth. In this paper, we set to break this practicality barrier of DIP, and propose an efficient ES strategy, which consistently detects near-peak performance across several vision tasks and DIP variants. Based on a simple measure of dispersion of consecutive DIP reconstructions, our ES method not only outpaces the existing ones -- which only work in very narrow domains, but also remains effective when combined with a number of methods that try to mitigate the overfitting. The code is available at https://github.com/sun-umn/Early_Stop**_for_DIP. △ Less

Submitted 11 December, 2023; v1 submitted 11 December, 2021; originally announced December 2021.

Comments: Published in TMLR (https://openreview.net/forum?id=231ZzrLC8X)

Journal ref: Transactions on Machine Learning Research (TMLR), 2835-8856 (12/2023)

arXiv:2110.12271 [pdf, other]

Self-Validation: Early Stop** for Single-Instance Deep Generative Priors

Authors: Taihui Li, Zhong Zhuang, Hengyue Liang, Le Peng, Hengkang Wang, Ju Sun

Abstract: Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled… ▽ More Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled in an ad-hoc manner. In this paper, we propose the first principled method for ES when applying SIDGPs to IR, taking advantage of the typical bell trend of the reconstruction quality. In particular, our method is based on collaborative training and self-validation: the primal reconstruction process is monitored by a deep autoencoder, which is trained online with the historic reconstructed images and used to validate the reconstruction quality constantly. Experimentally, on several IR problems and different SIDGPs, our self-validation method is able to reliably detect near-peak performance and signal good ES points. Our code is available at https://sun-umn.github.io/Self-Validation/. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: To appear in British Machine Vision Conference (BMVC) 2021

arXiv:2109.00154 [pdf, other]

DOA Estimation Using Massive Receive MIMO: Basic Principle and Key Techniques

Authors: Jiangzhou Wang, Baihua Shi, Feng Shu, Qi Zhang, Di Wu, Qijuan Jie, Zhihong Zhuang, Siling Feng, Yi** Zhang

Abstract: As massive multiple-input multiple-output (MIMO) becomes popular, direction of arrival (DOA) measurement has been made a real renaissance due to the high-resolution achieved. Thus, there is no doubt about DOA estimation using massive MIMO. The purpose of this paper is to describe its basic principles and key techniques, to present the performance analysis, and to appreciate its engineering applica… ▽ More As massive multiple-input multiple-output (MIMO) becomes popular, direction of arrival (DOA) measurement has been made a real renaissance due to the high-resolution achieved. Thus, there is no doubt about DOA estimation using massive MIMO. The purpose of this paper is to describe its basic principles and key techniques, to present the performance analysis, and to appreciate its engineering applications. It is anticipated that there are still many challenges in DOA estimation using massive receive MIMO, such as high circuit cost, high energy consumption and high complexity of the algorithm implementation. New researches and breakthroughs are illustrated to deal with those problems. Then, a new architecture, hybrid analog and digital (HAD) massive receive MIMO with low-resolution ADCs, is presented to strike a good balance among circuit cost, complexity and performance. Then, a novel three-dimensional (3D) angle of arrival (AOA) localization method based on geometrical center is proposed to compute the position of a passive emitter using single base station equipped with an ultra-massive MIMO system. And, it can achieve the Cramer-Rao low bound (CRLB). Here, the performance loss is also analyzed to quantify the minimum number of bits. DOA estimation will play a key role in lots of applications, such as directional modulation, beamforming tracking and alignment for 6G. △ Less

Submitted 15 July, 2023; v1 submitted 31 August, 2021; originally announced September 2021.

arXiv:2106.04812 [pdf, other]

Phase Retrieval using Single-Instance Deep Generative Prior

Authors: Kshitij Tayal, Raunak Manekar, Zhong Zhuang, David Yang, Vipin Kumar, Felix Hofmann, Ju Sun

Abstract: Several deep learning methods for phase retrieval exist, but most of them fail on realistic data without precise support information. We propose a novel method based on single-instance deep generative prior that works well on complex-valued crystal data. Several deep learning methods for phase retrieval exist, but most of them fail on realistic data without precise support information. We propose a novel method based on single-instance deep generative prior that works well on complex-valued crystal data. △ Less

Submitted 22 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:1911.09887 [pdf, other]

UAV-enabled Secure Communication with Finite Blocklength

Authors: Yuntian Wang, Xiaobo Zhou, Zhihong Zhuang, Linlin Sun, Yuwen Qian, **hui Lu, Feng Shu

Abstract: In the finite blocklength scenario, which is suitable for practical applications, a method of maximizing the average effective secrecy rate (AESR) is proposed for a UAV-enabled secure communication by optimizing the UAV's trajectory and transmit power subject to the UAV's mobility constraints and transmit power constraints. To address the formulated non-convex optimization problem, it is first dec… ▽ More In the finite blocklength scenario, which is suitable for practical applications, a method of maximizing the average effective secrecy rate (AESR) is proposed for a UAV-enabled secure communication by optimizing the UAV's trajectory and transmit power subject to the UAV's mobility constraints and transmit power constraints. To address the formulated non-convex optimization problem, it is first decomposed into two non-convex subproblems. Then the two subproblems are converted respectively into two convex subproblems via the first-order approximation. Finally, an alternating iteration algorithm is developed by solving the two subproblems iteratively using successive convex approximation (SCA) technique. Numerical results show that our proposed scheme achieves a better AESR performance than both the benchmark schemes. △ Less

Submitted 26 March, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

arXiv:1908.09289 [pdf, ps, other]

UAV-Enabled Uplink Non-Orthogonal Multiple Access System: Joint Deployment and Power Control

Authors: **hui Lu, Yuntian Wang, Tingting Liu, Zhihong Zhuang, Xiaobo Zhou, Feng Shu, Zhu Han

Abstract: In order to overcome the inherent latency in multi-user unmanned aerial vehicle (UAV) networks with orthogonal multiple access (OMA). In this paper, we investigate the UAV enabled uplink non-orthogonal multiple access (NOMA) network, where a UAV is deployed to collect the messages transmitted by ground users. In order to maximize the sum rate of all users and to meet the quality of service (QoS) r… ▽ More In order to overcome the inherent latency in multi-user unmanned aerial vehicle (UAV) networks with orthogonal multiple access (OMA). In this paper, we investigate the UAV enabled uplink non-orthogonal multiple access (NOMA) network, where a UAV is deployed to collect the messages transmitted by ground users. In order to maximize the sum rate of all users and to meet the quality of service (QoS) requirement, we formulate an optimization problem, in which the UAV deployment position and the power control are jointly optimized. This problem is non-convex and some variables are binary, and thus it is a typical NP hard problem. In this paper, an iterative algorithm is proposed with the assistance of successive convex approximate (SCA) technique and the penalty function method. In order to reduce the high computational complexity of the iterative algorithm, a low complexity approximation algorithm is then proposed, which can achieve a similar performance compared to the iterative algorithm. Compared with OMA scheme and conventional NOMA scheme, numerical results show that our proposed algorithms can efficiently improve the sum rate. △ Less

Submitted 25 August, 2019; originally announced August 2019.

arXiv:1907.06817 [pdf, ps, other]

Energy-efficient Alternating Iterative Secure Structure of Maximizing Secrecy Rate for Directional Modulation Networks

Authors: Linlin Sun, Jiayu Li, Yu Zhang, Yuntian Wang, Linqing Gui, Fanyuan Li, Haochen Li, Zhihong Zhuang, Feng Shu

Abstract: In a directional modulation (DM) network, the issues of security and privacy have taken on an increasingly important role. Since the power allocation of confidential message and artificial noise will make a constructive effect on the system performance, it is important to jointly consider the relationship between the beamforming vectors and the power allocation (PA) factors. To maximize the secrec… ▽ More In a directional modulation (DM) network, the issues of security and privacy have taken on an increasingly important role. Since the power allocation of confidential message and artificial noise will make a constructive effect on the system performance, it is important to jointly consider the relationship between the beamforming vectors and the power allocation (PA) factors. To maximize the secrecy rate (SR), an alternating iterative structure (AIS) between the beamforming and PA is proposed. With only two or three iterations, it can rapidly converge to its rate ceil. Simulation results indicate that the SR performance of proposed AIS is much better than the null-space projection (NSP) based PA strategy in the medium and large signal-to-noise ratio (SNR) regions, especially when the number of antennas at the DM transmitter is small. △ Less

Submitted 15 July, 2019; originally announced July 2019.

arXiv:1905.04837 [pdf, ps, other]

Secure Hybrid Digital and Analog Precoder for mmWave Systems with low-resolution DACs and finite-quantized phase shifters

Authors: Ling Xu, Feng Shu, Guiyang Xia, Yi** Zhang, Zhihong Zhuang, Jiangzhou Wang

Abstract: Millimeter wave (mmWave) communication has been regarded as one of the most promising technologies for the future generation wireless networks because of its advantages of providing a ultra-wide new spectrum and ultra-high data transmission rate. To reduce the power consumption and circuit cost for mmWave systems, hybrid digital and analog (HDA) architecture is preferred in such a scenario. In thi… ▽ More Millimeter wave (mmWave) communication has been regarded as one of the most promising technologies for the future generation wireless networks because of its advantages of providing a ultra-wide new spectrum and ultra-high data transmission rate. To reduce the power consumption and circuit cost for mmWave systems, hybrid digital and analog (HDA) architecture is preferred in such a scenario. In this paper, an artificial-noise (AN) aided secure HDA beamforming scheme is proposed for mmWave MISO system with low resolution digital-to-analog converters (DACs) and finite-quantized phase shifters on RF. The additive quantization noise model for AN aided HDA system is established to make an analysis of the secrecy performance of such systems. With the partial channel knowledge of eavesdropper available, an approximate expression of secrecy rate (SR) is derived. Then using this approximation formula, we propose a two-layer alternately iterative structure (TLAIS) for optimizing digital precoder (DP) of confidential message (CM), digital AN projection matrix (DANPM) and analog precoder (AP). The inner-layer iteration loop is to design the DP of CMs and DANPM alternatively given a fixed matrix of AP. The outer-layer iteration loop is in between digital baseband part and analog part, where the former refers to DP and DANPM, and the latter is AP. Then for a given digital part, we propose a gradient ascent algorithm to find the vector of AP vector. Given a matrix of AP, we make use of general power iteration (GPI) method to compute DP and DANPM. This process is repeated until the terminal condition is reached. Simulation results show that the proposed TLAIS can achieve a better SR performance compared to existing methods, especially in the high signal-to-noise ratio region. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Comments: 11 pages,7figures

arXiv:1808.00646 [pdf, ps, other]

doi 10.1109/JSYST.2019.2918168

Power Allocation Strategies for Secure Spatial Modulation

Authors: Guiyang Xia, Linqiong Jia, Yuwen Qian, Feng Shu, Zhihong Zhuang, Jiangzhou Wang

Abstract: In secure spatial modulation (SM) networks, power allocation (PA) strategies are investigated in this paper under the total power constraint. Considering that there is no closed-form expression for secrecy rate (SR), an approximate closed-form expression of SR is presented, which is used as an efficient metric to optimize PA factor and can greatly reduce the computation complexity. Based on this e… ▽ More In secure spatial modulation (SM) networks, power allocation (PA) strategies are investigated in this paper under the total power constraint. Considering that there is no closed-form expression for secrecy rate (SR), an approximate closed-form expression of SR is presented, which is used as an efficient metric to optimize PA factor and can greatly reduce the computation complexity. Based on this expression, a convex optimization (CO) method of maximizing SR (Max-SR) is proposed accordingly. Furthermore, a method of maximizing the product of signal-to-leakage and noise ratio (SLNR) and artificial noise-to-leakage-and noise ratio (ANLNR) (Max-P-SAN) is proposed to provide an analytic solution to PA with extremely low-complexity. Simulation results demonstrate that the SR performance of the proposed CO method is close to that of the optimal PA strategy of Max-SR with exhaustive search and better than that of Max-P-SAN in the high signal-to-noise ratio (SNR) region. However, in the low and medium SNR regions, the SR performance of the proposed Max-P-SAN slightly exceeds that of the proposed CO. △ Less

Submitted 1 August, 2018; originally announced August 2018.

arXiv:1807.03534 [pdf, other]

doi 10.1109/ACCESS.2018.2852636

Underwater Source Localization Using TDOA and FDOA Measurements with Unknown Propagation Speed and Sensor Parameter Errors

Authors: Bingbing Zhang, Yongchang Hu, Hongyi Wang, Zhaowen Zhuang

Abstract: Underwater source localization problems are complicated and challenging: a) the sound propagation speed is often unknown and the unpredictable ocean current might lead to the uncertainties of sensor parameters (i.e. position and velocity); b) the underwater acoustic signal travels much slower than the radio one in terrestrial environments, thus resulting in a significantly severe Doppler effect; c… ▽ More Underwater source localization problems are complicated and challenging: a) the sound propagation speed is often unknown and the unpredictable ocean current might lead to the uncertainties of sensor parameters (i.e. position and velocity); b) the underwater acoustic signal travels much slower than the radio one in terrestrial environments, thus resulting in a significantly severe Doppler effect; c) energy-efficient techniques are urgently required and hence in favour of the design with a low computational complexity. Considering these issues, we propose a simple and efficient underwater source localization approach based on time difference of arrival (TDOA) and frequency difference of arrival (FDOA) measurements, which copes with unknown propagation speed and sensor parameter errors. The proposed method mitigates the impact of the Doppler effect for accurately inferring the source parameters (i.e. position and velocity). The Cramer-Rao lower bounds (CRLBs) for this kind of localization are derived and, moreover, the analytical study shows that our method can yield the performance that is very close to the CRLB, particularly under small noise. The numerical results not only confirm the above conclusions but also show that our method outperforms other competing approaches. △ Less

Submitted 24 July, 2018; v1 submitted 10 July, 2018; originally announced July 2018.

Journal ref: IEEE Access 6(1):36645-36661, 2018

Showing 1–24 of 24 results for author: Zhuang, Z