Search | arXiv e-print repository

Neural Network-based Two-Dimensional Filtering for OTFS Symbol Detection

Authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu

Abstract: Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design… ▽ More Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design the RC architecture based on the properties of the OTFS system to fully unlock the potential of RC. This paper introduces a novel two-dimensional RC (2D-RC) approach for online symbol detection on a subframe basis in the OTFS system. The 2D-RC is designed to have a two-dimensional (2D) filtering structure to equalize the 2D circular channel effect in the delay-Doppler (DD) domain of the OTFS system. With the introduced architecture, the 2D-RC can operate in the DD domain with only a single neural network, unlike our previous work which requires multiple RCs to track channel variations in the time domain. Experimental results demonstrate the advantages of the 2D-RC approach over the previous RC-based approach and the compared model-based methods across different modulation orders. △ Less

Submitted 8 March, 2024; originally announced June 2024.

Comments: 6 pages, conference paper. arXiv admin note: substantial text overlap with arXiv:2311.08543

arXiv:2406.16711 [pdf]

Generalized Modal Analysis in Power System with High CIG Penetration: Concept and Quantitative Assessment

Authors: Le Zheng, Jiajie Zheng, Chongru Liu

Abstract: This paper presents a Generalized Modal Analysis (GMA) concept for the small-signal stability analysis of power systems with high penetration of Converter-Interfaced Generation (CIG). GMA quantitatively assesses interactions between various elements in the power system, offering intuitive and transparent physical interpretations. The method's versatility in selecting physical quantities at differe… ▽ More This paper presents a Generalized Modal Analysis (GMA) concept for the small-signal stability analysis of power systems with high penetration of Converter-Interfaced Generation (CIG). GMA quantitatively assesses interactions between various elements in the power system, offering intuitive and transparent physical interpretations. The method's versatility in selecting physical quantities at different input and output ports makes it broadly applicable. Based on the concept of GMA, the study further defines interaction quantification indices by selecting voltage ports, examining the impact of grid disturbances on power sources and the support from the power sources to the grid at connection points. Numerical simulations on modified 14-bus and 68-bus systems validate GMA's effectiveness in capturing the coupling of the dynamic characteristics between grid elements. This research provides a theoretical foundation and analytical framework for future analyses of power system stability with diverse power sources. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: submitted to IEEE Transactions on Power Systems for peer-review

arXiv:2406.11163 [pdf, other]

Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations

Authors: Shi Yan, Yan Liang, Huayu Zhang, Le Zheng, Difan Zou, Binglu Wang

Abstract: Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformati… ▽ More Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformation incorporates crucial global state information with support for bi-directional recursive computation. For the transformed model, the joint state-memory-trend Bayesian filtering and smoothing frameworks are derived by introducing the bidirectional memory iteration mechanism and offline data into Bayesian estimation theory. The derived frameworks are implemented using the Gaussian approximation to ensure analytical properties and computational efficiency. Finally, the neural network modules within EBRNS and its two-stage training scheme are designed. Unlike most existing approaches that artificially combine deep learning and model-based estimation, the bidirectional recursion and internal gated structures of EBRNS are naturally derived from Bayesian estimation theory, explainably integrating prior model knowledge, online measurement, and offline data. Experiments on representative real-world datasets demonstrate that the high smoothing accuracy of EBRNS is accompanied by data efficiency and a lightweight parameter scale. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.00421 [pdf]

Modal Analysis of Power System with High CIG Penetration Based on Impedance Models

Authors: Le Zheng, Jiajie Zheng, Jiajian Lin, Chongru Liu

Abstract: This paper explores the modal analysis of power systems with high Converter-Interfaced Generation (CIG) penetration utilizing an impedance-based modeling approach. Traditional modal analysis based on the state-space model (MASS) requires comprehensive control structures and parameters of each system element, a challenging prerequisite as converters increasingly integrate into power systems and the… ▽ More This paper explores the modal analysis of power systems with high Converter-Interfaced Generation (CIG) penetration utilizing an impedance-based modeling approach. Traditional modal analysis based on the state-space model (MASS) requires comprehensive control structures and parameters of each system element, a challenging prerequisite as converters increasingly integrate into power systems and their internal specifics remain largely inaccessible. Conversely, the proposed modal analysis based on the impedance model (MAI) leverages only the impedance port characteristics to pinpoint system elements significantly influencing unstable modes. This study is the first to confirm the theoretical equivalency between MASS and MAI in terms of transfer functions, eigenvalues, and sensitivities, thus bridging the gap between detailed theoretical modeling and practical, accessible analyses. We further provide enhancements to the MAI method, including a revised element participation index, a transformer ratio-based admittance sensitivity adjustment, and an impedance splitting-based sensitivity analysis considering parameter variations. Validation through numerical simulations on a modified IEEE 14-bus system underscores the efficacy of our approach. By examining the interplay between different elements and system modes in high CIG environments, this study offers insights and a foundational framework for delineating the oscillatory modes' participation and stability characteristics of power systems with substantial CIG integration. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.07029 [pdf]

A framework of text-dependent speaker verification for chinese numerical string corpus

Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impa… ▽ More The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impacted by reading rhythms and pauses. To address this problem, we propose an end-to-end speaker verification system that enhances TD-SV by decoupling speaker and text information. Our system consists of a text embedding extractor, a speaker embedding extractor and a fusion module. In the text embedding extractor, we employ an enhanced Transformer and introduce a triple loss including text classification loss, connectionist temporal classification (CTC) loss and decoder loss; while in the speaker embedding extractor, we create a multi-scale pooling method by combining sliding window attentive statistics pooling (SWASP) with attentive statistics pooling (ASP). To mitigate the scarcity of data, we have recorded a publicly available Chinese numerical corpus named SHALCAS22A (hereinafter called SHAL), which can be accessed on Open-SLR. Moreover, we employ data augmentation techniques using Tacotron2 and HiFi-GAN. Our method achieves an equal error rate (EER) performance improvement of 49.2% on Hi-Mia and 75.0% on SHAL, respectively. △ Less

Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.01645

arXiv:2403.16643 [pdf, other]

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

Authors: Qing** Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such artifacts, ranging from trivial noise to unauthentic textures, deviate from the true structure of the source image, thus challenging the integrity of the super-resolution process. In this work, we propose Self-Adaptive Reality-Guided Diffusion (SARGD), a training-free method that delves into the latent space to effectively identify and mitigate the propagation of artifacts. Our SARGD begins by using an artifact detector to identify implausible pixels, creating a binary mask that highlights artifacts. Following this, the Reality Guidance Refinement (RGR) process refines artifacts by integrating this mask with realistic latent representations, improving alignment with the original image. Nonetheless, initial realistic-latent representations from lower-quality images result in over-smoothing in the final output. To address this, we introduce a Self-Adaptive Guidance (SAG) mechanism. It dynamically computes a reality score, enhancing the sharpness of the realistic latent. These alternating mechanisms collectively achieve artifact-free super-resolution. Extensive experiments demonstrate the superiority of our method, delivering detailed artifact-free high-resolution images while reducing sampling steps by 2X. We release our code at https://github.com/ProAirVerse/Self-Adaptive-Guidance-Diffusion.git. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.10520 [pdf, other]

Strong and Controllable Blind Image Decomposition

Authors: Zeyu Zhang, Junlin Han, Chenhui Gou, Hongdong Li, Liang Zheng

Abstract: Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing u… ▽ More Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing users to enter which types of degradation to remove or retain. We design an architecture named controllable blind image decomposition network. Inserted in the middle of U-Net structure, our method first decomposes the input feature maps and then recombines them according to user instructions. Advantageously, this functionality is implemented at minimal computational cost: decomposition and recombination are all parameter-free. Experimentally, our system excels in blind image decomposition tasks and can outputs partially or fully restored images that well reflect user intentions. Furthermore, we evaluate and configure different options for the network structure and loss functions. This, combined with the proposed decomposition-and-recombination method, yields an efficient and competitive system for blind image decomposition, compared with current state-of-the-art methods. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Code: https://github.com/Zhangzeyu97/CBD.git

arXiv:2403.03675 [pdf, other]

ZF Beamforming Tensor Compression for Massive MIMO Fronthaul

Authors: Libin Zheng, Zihao Wang, Minru Bai, Zhenjie Tan

Abstract: In the rapidly evolving landscape of 5G and beyond 5G (B5G) mobile cellular communications, efficient data compression and reconstruction strategies become paramount, especially in massive multiple-input multiple-output (MIMO) systems. A critical challenge in these systems is the capacity-limited fronthaul, particularly in the context of the Ethernet-based common public radio interface (eCPRI) con… ▽ More In the rapidly evolving landscape of 5G and beyond 5G (B5G) mobile cellular communications, efficient data compression and reconstruction strategies become paramount, especially in massive multiple-input multiple-output (MIMO) systems. A critical challenge in these systems is the capacity-limited fronthaul, particularly in the context of the Ethernet-based common public radio interface (eCPRI) connecting baseband units (BBUs) and remote radio units (RRUs). This capacity limitation hinders the effective handling of increased traffic and data flows. We propose a novel two-stage compression approach to address this bottleneck. The first stage employs sparse Tucker decomposition, targeting the weight tensor's low-rank components for compression. The second stage further compresses these components using complex givens decomposition and run-length encoding, substantially improving the compression ratio. Our approach specifically targets the Zero-Forcing (ZF) beamforming weights in BBUs. By reconstructing these weights in RRUs, we significantly alleviate the burden on eCPRI traffic, enabling a higher number of concurrent streams in the radio access network (RAN). Through comprehensive evaluations, we demonstrate the superior effectiveness of our method in Channel State Information (CSI) compression, paving the way for more efficient 5G/B5G fronthaul links. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.02651 [pdf, other]

Learning at the Speed of Wireless: Online Real-Time Learning for AI-Enabled MIMO in NextG

Authors: Jiarui Xu, Shashank Jere, Yifei Song, Yi-Hung Kao, Lizhong Zheng, Lingjia Liu

Abstract: Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its variants such as multi-user MIMO (MU-MIMO) and massive/full-dimension MIMO have been key enablers across successive generations of cellular networks wit… ▽ More Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its variants such as multi-user MIMO (MU-MIMO) and massive/full-dimension MIMO have been key enablers across successive generations of cellular networks with evolving complexity and design challenges. Initiating active investigation into leveraging AI/ML tools to address these challenges for MIMO becomes a critical step towards an AI-enabled NextG air interface. At the NextG air interface, the underlying wireless environment will be extremely dynamic with operation adaptations performed on a sub-millisecond basis by MIMO operations such as MU-MIMO scheduling and rank/link adaptation. Given the enormously large number of operation adaptation possibilities, we contend that online real-time AI/ML-based approaches constitute a promising paradigm. To this end, we outline the inherent challenges and offer insights into the design of such online real-time AI/ML-based solutions for MIMO operations. An online real-time AI/ML-based method for MIMO-OFDM channel estimation is then presented, serving as a potential roadmap for develo** similar techniques across various MIMO operations in NextG. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures, 1 table, magazine paper

arXiv:2402.04704 [pdf, ps, other]

doi 10.1109/TCOMM.2023.3325472

Device Activity Detection and Channel Estimation for Millimeter-Wave Massive MIMO

Authors: Yinchuan Li, Yuancheng Zhan, Le Zheng, Xiaodong Wang

Abstract: Millimeter-Wave Massive MIMO is important for beyond 5G or 6G wireless communication networks. The goal of this paper is to establish successful communication between the cellular base stations and devices, focusing on the problem of joint user activity detection and channel estimation. Different from traditional compressed sensing (CS) methods that only use the sparsity of user activities, we dev… ▽ More Millimeter-Wave Massive MIMO is important for beyond 5G or 6G wireless communication networks. The goal of this paper is to establish successful communication between the cellular base stations and devices, focusing on the problem of joint user activity detection and channel estimation. Different from traditional compressed sensing (CS) methods that only use the sparsity of user activities, we develop several Approximate Message Passing (AMP) based CS algorithms by exploiting the sparsity of user activities and mmWave channels. First, a group soft-thresholding AMP is presented to utilize only the user activity sparsity. Second, a hard-thresholding AMP is proposed based on the on-grid CS approach. Third, a super-resolution AMP algorithm is proposed based on atomic norm, in which a greedy method is proposed as a super-resolution denoiser. And we smooth the denoiser based on Monte Carlo sampling to have Lipschitz continuity and present state evolution results. Extensive simulation results show that the proposed method outperforms the previous state-of-the-art methods. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Published in: IEEE Transactions on Communications, 2023

arXiv:2312.01645 [pdf]

A text-dependent speaker verification application framework based on Chinese numerical string corpus

Authors: Litong Zheng, Feng Hong, Weijie Xu

Abstract: Researches indicate that text-dependent speaker verification (TD-SV) often outperforms text-independent verification (TI-SV) in short speech scenarios. However, collecting large-scale fixed text speech data is challenging, and as speech length increases, factors like sentence rhythm and pauses affect TDSV's sensitivity to text sequence. Based on these factors, We propose the hypothesis that strate… ▽ More Researches indicate that text-dependent speaker verification (TD-SV) often outperforms text-independent verification (TI-SV) in short speech scenarios. However, collecting large-scale fixed text speech data is challenging, and as speech length increases, factors like sentence rhythm and pauses affect TDSV's sensitivity to text sequence. Based on these factors, We propose the hypothesis that strategies such as more fine-grained pooling methods on time scales and decoupled representations of speech speaker embedding and text embedding are more suitable for TD-SV. We have introduced an end-to-end TD-SV system based on a dataset comprising longer Chinese numerical string texts. It contains a text embedding network, a speaker embedding network, and back-end fusion. First, we recorded a dataset consisting of long Chinese numerical text named SHAL, which is publicly available on the Open-SLR website. We addressed the issue of dataset scarcity by augmenting it using Tacotron2 and HiFi-GAN. Next, we introduced a dual representation of speech with text embedding and speaker embedding. In the text embedding network, we employed an enhanced Transformer and introduced a triple loss that includes text classification loss, CTC loss, and decoder loss. For the speaker embedding network, we enhanced a sliding window attentive statistics pooling (SWASP), combined with attentive statistics pooling (ASP) to create a multi-scale pooling method. Finally, we fused text embedding and speaker embedding. Our pooling methods achieved an equal error rate (EER) performance improvement of 49.2% on Hi-Mia and 75.0% on SHAL, respectively. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.18788 [pdf, other]

doi 10.1016/j.media.2020.101942

Automated interpretation of congenital heart disease from multi-view echocardiograms

Authors: **g Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang

Abstract: Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical… ▽ More Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework. We collect the five-view echocardiograms video records of 1308 subjects (including normal controls, ventricular septal defect (VSD) patients and atrial septal defect (ASD) patients) with both disease labels and standard-view key-frame labels. Depthwise separable convolution-based multi-channel networks are adopted to largely reduce the network parameters. We also approach the imbalanced class problem by augmenting the positive training samples. Our 2D key-frame model can diagnose CHD or negative samples with an accuracy of 95.4\%, and in negative, VSD or ASD classification with an accuracy of 92.3\%. To further alleviate the work of key-frame selection in real-world implementation, we propose an adaptive soft attention scheme to directly explore the raw video data. Four kinds of neural aggregation methods are systematically investigated to fuse the information of an arbitrary number of frames in a video. Moreover, with a view detection module, the system can work without the view records. Our video-based model can diagnose with an accuracy of 93.9\% (binary classification), and 92.1\% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing. The detailed ablation study and the interpretability analysis are provided. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: Published in Medical Image Analysis

Journal ref: Medical Image Analysis (Volume 69, April 2021, 101942)

arXiv:2311.08543 [pdf, other]

2D-RC: Two-Dimensional Neural Network Approach for OTFS Symbol Detection

Authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu

Abstract: Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only a limited number of over-the-air (OTA) pilot symbols are utilized for training. However, this approach does not leverage the do… ▽ More Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only a limited number of over-the-air (OTA) pilot symbols are utilized for training. However, this approach does not leverage the domain knowledge specific to the OTFS system to fully unlock the potential of RC. This paper introduces a novel two-dimensional RC (2D-RC) method that incorporates the domain knowledge of the OTFS system into the design for symbol detection in an online subframe-based manner. Specifically, as the channel interaction in the delay-Doppler (DD) domain is a two-dimensional (2D) circular operation, the 2D-RC is designed to have the 2D circular padding procedure and the 2D filtering structure to embed this knowledge. With the introduced architecture, 2D-RC can operate in the DD domain with only a single neural network, instead of necessitating multiple RCs to track channel variations in the time domain as in previous work. Numerical experiments demonstrate the advantages of the 2D-RC approach over the previous RC-based approach and compared model-based methods across different OTFS system variants and modulation orders. △ Less

Submitted 24 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: 15 pages, journal submission

arXiv:2310.17414 [pdf, other]

LEI2JSON: Schema-based Validation and Conversion of Livestock Event Information

Authors: Mahir Habib, Muhammad Ashad Kabir, Lihong Zheng

Abstract: Livestock producers often need help in standardising (i.e., converting and validating) their livestock event data. This article introduces a novel solution, LEI2JSON (Livestock Event Information To JSON). The tool is an add-on for Google Sheets, adhering to the livestock event information (LEI) schema. The core objective of LEI2JSON is to provide livestock producers with an efficient mechanism to… ▽ More Livestock producers often need help in standardising (i.e., converting and validating) their livestock event data. This article introduces a novel solution, LEI2JSON (Livestock Event Information To JSON). The tool is an add-on for Google Sheets, adhering to the livestock event information (LEI) schema. The core objective of LEI2JSON is to provide livestock producers with an efficient mechanism to standardise their data, leading to substantial savings in time and resources. This is achieved by building the spreadsheet template with the appropriate column headers, notes, and validation rules, converting the spreadsheet data into JSON format, and validating the output against the schema. LEI2JSON facilitates the seamless storage of livestock event information locally or on Google Drive in JSON. Additionally, we have conducted an extensive experimental evaluation to assess the effectiveness of the tool. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 20 pages, 6 figures

arXiv:2310.17187 [pdf, other]

Explainable Gated Bayesian Recurrent Neural Network for Non-Markov State Estimation

Authors: Shi Yan, Yan Liang, Le Zheng, Mingyang Fan, Xiaoxu Wang, Binglu Wang

Abstract: The optimality of Bayesian filtering relies on the completeness of prior models, while deep learning holds a distinct advantage in learning models from offline data. Nevertheless, the current fusion of these two methodologies remains largely ad hoc, lacking a theoretical foundation. This paper presents a novel solution, namely an explainable gated Bayesian recurrent neural network specifically des… ▽ More The optimality of Bayesian filtering relies on the completeness of prior models, while deep learning holds a distinct advantage in learning models from offline data. Nevertheless, the current fusion of these two methodologies remains largely ad hoc, lacking a theoretical foundation. This paper presents a novel solution, namely an explainable gated Bayesian recurrent neural network specifically designed to state estimation under model mismatches. Firstly, we transform the non-Markov state-space model into an equivalent first-order Markov model with memory. It is a generalized transformation that overcomes the limitations of the first-order Markov property and enables recursive filtering. Secondly, by deriving a data-assisted joint state-memory-mismatch Bayesian filtering, we design a Bayesian gated framework that includes a memory update gate for capturing the temporal regularities in state evolution, a state prediction gate with the evolution mismatch compensation, and a state update gate with the observation mismatch compensation. The Gaussian approximation implementation of the filtering process within the gated framework is derived, taking into account the computational efficiency. Finally, the corresponding internal neural network structures and end-to-end training methods are designed. The Bayesian filtering theory enhances the interpretability of the proposed gated network, enabling the effective integration of offline data and prior models within functionally explicit gated units. In comprehensive experiments, including simulations and real-world datasets, the proposed gated network demonstrates superior estimation performance compared to benchmark filters and state-of-the-art deep learning filtering methods. △ Less

Submitted 7 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.04956 [pdf, ps, other]

Towards Explainable Machine Learning: The Effectiveness of Reservoir Computing in Wireless Receive Processing

Authors: Shashank Jere, Karim Said, Lizhong Zheng, Lingjia Liu

Abstract: Deep learning has seen a rapid adoption in a variety of wireless communications applications, including at the physical layer. While it has delivered impressive performance in tasks such as channel equalization and receive processing/symbol detection, it leaves much to be desired when it comes to explaining this superior performance. In this work, we investigate the specific task of channel equali… ▽ More Deep learning has seen a rapid adoption in a variety of wireless communications applications, including at the physical layer. While it has delivered impressive performance in tasks such as channel equalization and receive processing/symbol detection, it leaves much to be desired when it comes to explaining this superior performance. In this work, we investigate the specific task of channel equalization by applying a popular learning-based technique known as Reservoir Computing (RC), which has shown superior performance compared to conventional methods and other learning-based approaches. Specifically, we apply the echo state network (ESN) as a channel equalizer and provide a first principles-based signal processing understanding of its operation. With this groundwork, we incorporate the available domain knowledge in the form of the statistics of the wireless channel directly into the weights of the ESN model. This paves the way for optimized initialization of the ESN model weights, which are traditionally untrained and randomly initialized. Finally, we show the improvement in receive processing/symbol detection performance with this optimized initialization through simulations. This is a first step towards explainable machine learning (XML) and assigning practical model interpretability that can be utilized together with the available domain knowledge to improve performance and enhance detection reliability. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: This work has been accepted to IEEE Military Communications Conference (MILCOM) 2023. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.16457 [pdf, other]

SI-SD: Sleep Interpreter through awake-guided cross-subject Semantic Decoding

Authors: Hui Zheng, Zhong-Tao Chen, Hai-Teng Wang, Jian-Yang Zhou, Lin Zheng, Pei-Yang Lin, Yun-Zhe Liu

Abstract: Understanding semantic content from brain activity during sleep represents a major goal in neuroscience. While studies in rodents have shown spontaneous neural reactivation of memories during sleep, capturing the semantic content of human sleep poses a significant challenge due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness an… ▽ More Understanding semantic content from brain activity during sleep represents a major goal in neuroscience. While studies in rodents have shown spontaneous neural reactivation of memories during sleep, capturing the semantic content of human sleep poses a significant challenge due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 134 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed SI-SD that enhances sleep semantic decoding through the position-wise alignment of neural latent sequence between wakefulness and sleep. In the 15-way classification task, our model achieves 24.12% and 21.39% top-1 accuracy on unseen subjects for NREM 2/3 and REM sleep, respectively, surpassing all other baselines. With additional fine-tuning, decoding performance improves to 30.32% and 31.65%, respectively. Besides, inspired by previous neuroscientific findings, we systematically analyze how the "Slow Oscillation" event impacts decoding performance in NREM 2/3 sleep -- decoding performance on unseen subjects further improves to 40.02%. Together, our findings and methodologies contribute to a promising neuro-AI framework for decoding brain activity during sleep. △ Less

Submitted 19 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.13856 [pdf, other]

doi 10.1109/TVT.2023.3319538

DNN-DANM: A High-Accuracy Two-Dimensional DOA Estimation Method Using Practical RIS

Authors: Zhimin Chen, Peng Chen, Le Zheng, Yudong Zhang

Abstract: Reconfigurable intelligent surface (RIS) or intelligent reflecting surface (IRS) has been an attractive technology for future wireless communication and sensing systems. However, in the practical RIS, the mutual coupling effect among RIS elements, the reflection phase shift, and amplitude errors will degrade the RIS performance significantly. This paper investigates the two-dimensional direction-o… ▽ More Reconfigurable intelligent surface (RIS) or intelligent reflecting surface (IRS) has been an attractive technology for future wireless communication and sensing systems. However, in the practical RIS, the mutual coupling effect among RIS elements, the reflection phase shift, and amplitude errors will degrade the RIS performance significantly. This paper investigates the two-dimensional direction-of-arrival (DOA) estimation problem in the scenario using a practical RIS. After formulating the system model with the mutual coupling effect and the reflection phase/amplitude errors of the RIS, a novel DNNDANM method is proposed for the DOA estimation by combining the deep neural network (DNN) and the decoupling atomic norm minimization (DANM). The DNN step reconstructs the received signal from the one with RIS impairments, and the DANM step exploits the signal sparsity in the two-dimensional spatial domain. Additionally, a semi-definite programming (SDP) method with low computational complexity is proposed to solve the atomic minimization problem. Finally, both simulation and prototype are carried out to show estimation performance, and the proposed method outperforms the existing methods in the two-dimensional DOA estimation with low complexity in the scenario with practical RIS. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 11 pages, 12 figures

Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023

arXiv:2309.13585 [pdf, other]

Identification of Ghost Targets for Automotive Radar in the Presence of Multipath

Authors: Le Zheng, Jiamin Long, Marco Lops, Fan Liu, Xueyao Hu

Abstract: Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array fo… ▽ More Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array for angle finding. However, multiple paths im**ing the receiver is a major limiting factor, in that radar signals may bounce off obstacles, creating echoes for which the DOD does not equal the DOA. Thus, in complex scenarios with multiple scatterers, the direct paths of the intended targets may be corrupted by indirect paths from other objects, which leads to inaccurate angle estimation or ghost targets. In this paper, we focus on detecting the presence of ghosts due to multipath by regarding it as the problem of deciding between a composite hypothesis, ${\cal H}_0$ say, that the observations only contain an unknown number of direct paths sharing the same (unknown) DOD's and DOA's, and a composite alternative, ${\cal H}_1$ say, that the observations also contain an unknown number of indirect paths, for which DOD's and DOA's do not coincide. We exploit the Generalized Likelihood Ratio Test (GLRT) philosophy to determine the detector structure, wherein the unknown parameters are replaced by carefully designed estimators. The angles of both the active direct paths and of the multi-paths are indeed estimated through a sparsity-enforced Compressed Sensing (CS) approach with Levenberg-Marquardt (LM) optimization to estimate the angular parameters in the continuous domain. An extensive performance analysis is finally offered in order to validate the proposed solution. △ Less

Submitted 26 September, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: 13 pages, 10 figures

arXiv:2309.05298 [pdf, other]

Real-Time Parallel Trajectory Optimization with Spatiotemporal Safety Constraints for Autonomous Driving in Congested Traffic

Authors: Lei Zheng, Rui Yang, Zengqi Peng, Haichao Liu, Michael Yu Wang, Jun Ma

Abstract: Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the… ▽ More Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the safe interaction between the AV and SVs in the presence of trajectory prediction errors resulting from the multi-modal behaviors of the SVs. By leveraging multiple shooting and constraint transcription, we transform the trajectory optimization problem into a nonlinear programming problem, which allows for the use of optimization solvers and parallel computing techniques to generate multiple feasible trajectories in parallel. Subsequently, these spatiotemporal trajectories are fed into a multi-objective evaluation module considering both safety and efficiency objectives, such that the optimal feasible trajectory corresponding to the optimal target lane can be selected. The proposed framework is validated through simulations in a dense and congested driving scenario with multiple uncertain SVs. The results demonstrate that our method enables the AV to safely navigate through a dense and congested traffic scenario while achieving high travel efficiency and task accuracy in real time. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 8 pages, 7 figures, accepted for publication in the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2308.05929 [pdf, other]

doi 10.1109/tiv.2024.3389827

Spatiotemporal Receding Horizon Control with Proactive Interaction Towards Autonomous Driving in Dense Traffic

Authors: Lei Zheng, Rui Yang, Zengqi Peng, Michael Yu Wang, Jun Ma

Abstract: In dense traffic scenarios, ensuring safety while kee** high task performance for autonomous driving is a critical challenge. To address this problem, this paper proposes a computationally-efficient spatiotemporal receding horizon control (ST-RHC) scheme to generate a safe, dynamically feasible, energy-efficient trajectory in control space, where different driving tasks in dense traffic can be a… ▽ More In dense traffic scenarios, ensuring safety while kee** high task performance for autonomous driving is a critical challenge. To address this problem, this paper proposes a computationally-efficient spatiotemporal receding horizon control (ST-RHC) scheme to generate a safe, dynamically feasible, energy-efficient trajectory in control space, where different driving tasks in dense traffic can be achieved with high accuracy and safety in real time. In particular, an embodied spatiotemporal safety barrier module considering proactive interactions is devised to mitigate the effects of inaccuracies resulting from the trajectory prediction of other vehicles. Subsequently, the motion planning and control problem is formulated as a constrained nonlinear optimization problem, which favorably facilitates the effective use of off-the-shelf optimization solvers in conjunction with multiple shooting. The effectiveness of the proposed ST-RHC scheme is demonstrated through comprehensive comparisons with state-of-the-art algorithms on synthetic and real-world traffic datasets under dense traffic, and the attendant outcome of superior performance in terms of accuracy, efficiency and safety is achieved. △ Less

Submitted 26 May, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: 16 pages, 13 figures, accepted for publication in IEEE Transactions on Intelligent Vehicles

arXiv:2308.02464 [pdf, ps, other]

Universal Approximation of Linear Time-Invariant (LTI) Systems through RNNs: Power of Randomness in Reservoir Computing

Authors: Shashank Jere, Lizhong Zheng, Karim Said, Lingjia Liu

Abstract: Recurrent neural networks (RNNs) are known to be universal approximators of dynamic systems under fairly mild and general assumptions. However, RNNs usually suffer from the issues of vanishing and exploding gradients in standard RNN training. Reservoir computing (RC), a special RNN where the recurrent weights are randomized and left untrained, has been introduced to overcome these issues and has d… ▽ More Recurrent neural networks (RNNs) are known to be universal approximators of dynamic systems under fairly mild and general assumptions. However, RNNs usually suffer from the issues of vanishing and exploding gradients in standard RNN training. Reservoir computing (RC), a special RNN where the recurrent weights are randomized and left untrained, has been introduced to overcome these issues and has demonstrated superior empirical performance especially in scenarios where training samples are extremely limited. On the other hand, the theoretical grounding to support this observed performance has yet been fully developed. In this work, we show that RC can universally approximate a general linear time-invariant (LTI) system. Specifically, we present a clear signal processing interpretation of RC and utilize this understanding in the problem of approximating a generic LTI system. Under this setup, we analytically characterize the optimum probability density function for configuring (instead of training and/or randomly generating) the recurrent weights of the underlying RNN of the RC. Extensive numerical evaluations are provided to validate the optimality of the derived distribution for configuring the recurrent weights of the RC to approximate a general LTI system. Our work results in clear signal processing-based model interpretability of RC and provides theoretical explanation/justification for the power of randomness in randomly generating instead of training RC's recurrent weights. Furthermore, it provides a complete optimum analytical characterization for configuring the untrained recurrent weights, marking an important step towards explainable machine learning (XML) to incorporate domain knowledge for efficient learning. △ Less

Submitted 7 April, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: This work has been accepted to IEEE Journal of Selected Topics in Signal Processing (JSTSP). Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2306.07505 [pdf]

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.02003 [pdf, other]

On Optimal Caching and Model Multiplexing for Large Model Inference

Authors: Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

Abstract: Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to… ▽ More Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing. Theoretically, we provide an optimal algorithm for jointly optimizing both approaches to reduce the inference cost in both offline and online tabular settings. By combining a caching algorithm, namely Greedy Dual Size with Frequency (GDSF) or Least Expected Cost (LEC), with a model multiplexer, we achieve optimal rates in both offline and online settings. Empirically, simulations show that the combination of our caching and model multiplexing algorithms greatly improves over the baselines, with up to $50\times$ improvement over the baseline when the ratio between the maximum cost and minimum cost is $100$. Experiments on real datasets show a $4.3\times$ improvement in FLOPs over the baseline when the ratio for FLOPs is $10$, and a $1.8\times$ improvement in latency when the ratio for average latency is $1.85$. △ Less

Submitted 28 August, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.13487 [pdf, other]

Learning to Estimate: A Real-Time Online Learning Framework for MIMO-OFDM Channel Estimation

Authors: Lianjun Li, Sai Sree Rayala, Jiarui Xu, Lizhong Zheng, Lingjia Liu

Abstract: In this paper we introduce StructNet-CE, a novel real-time online learning framework for MIMO-OFDM channel estimation, which only utilizes over-the-air (OTA) pilot symbols for online training and converges within one OFDM subframe. The design of StructNet-CE leverages the structure information in the MIMO-OFDM system, including the repetitive structure of modulation constellation and the invariant… ▽ More In this paper we introduce StructNet-CE, a novel real-time online learning framework for MIMO-OFDM channel estimation, which only utilizes over-the-air (OTA) pilot symbols for online training and converges within one OFDM subframe. The design of StructNet-CE leverages the structure information in the MIMO-OFDM system, including the repetitive structure of modulation constellation and the invariant property of symbol classification to inter-stream interference. The embedded structure information enables StructNet-CE to conduct channel estimation with a binary classification task and accurately learn channel coefficients with as few as two pilot OFDM symbols. Experiments show that the channel estimation performance is significantly improved with the incorporation of structure knowledge. StructNet-CE is compatible and readily applicable to current and future wireless networks, demonstrating the effectiveness and importance of combining machine learning techniques with domain knowledge for wireless communication systems. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11469 [pdf, other]

doi 10.1016/j.engappai.2020.104151

(Rectified Version) The Barzilai-Borwein Method for Distributed Optimization over Unbalanced Directed Networks

Authors: **hui Hu, Xin Chen, Lifeng Zheng, Ling Zhang, Huaqing Li

Abstract: This paper studies optimization problems over multi-agent systems, in which all agents cooperatively minimize a global objective function expressed as a sum of local cost functions. Each agent in the systems uses only local computation and communication in the overall process without leaking their private information. Based on the Barzilai-Borwein (BB) method and multi-consensus inner loops, a dis… ▽ More This paper studies optimization problems over multi-agent systems, in which all agents cooperatively minimize a global objective function expressed as a sum of local cost functions. Each agent in the systems uses only local computation and communication in the overall process without leaking their private information. Based on the Barzilai-Borwein (BB) method and multi-consensus inner loops, a distributed algorithm with the availability of larger stepsizes and accelerated convergence, namely ADBB, is proposed. Moreover, owing to employing only row-stochastic weight matrices, ADBB can resolve the optimization problems over unbalanced directed networks without requiring the knowledge of neighbors' out-degree for each agent. Via establishing contraction relationships between the consensus error, the optimality gap, and the gradient tracking error, ADBB is theoretically proved to converge linearly to the globally optimal solution. A real-world data set is used in simulations to validate the correctness of the theoretical analysis. △ Less

Submitted 28 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 33 pages, 8 figures

Journal ref: Engineering Applications of Artificial Intelligence 99 (2021) 104151

arXiv:2303.08312 [pdf, ps, other]

Interference-Aware Constellation Design for Z-Interference Channels with Imperfect CSI

Authors: Xinliang Zhang, Mojtaba Vaezi, Lizhong Zheng

Abstract: A deep autoencoder (DAE)-based end-to-end communication over the two-user Z-interference channel (ZIC) with finite-alphabet inputs is designed in this paper. The design is for imperfect channel state information (CSI) where both estimation and quantization errors exist. The proposed structure jointly optimizes the encoders and decoders to generate interferenceaware constellations that adapt their… ▽ More A deep autoencoder (DAE)-based end-to-end communication over the two-user Z-interference channel (ZIC) with finite-alphabet inputs is designed in this paper. The design is for imperfect channel state information (CSI) where both estimation and quantization errors exist. The proposed structure jointly optimizes the encoders and decoders to generate interferenceaware constellations that adapt their shape to the interference intensity in order to minimize the bit error rate. A normalization layer is designed to guarantee an average power constraint in the DAE while allowing the architecture to generate constellations with nonuniform shapes. This brings further sha** gain compared to standard uniform constellations such as quadrature amplitude modulation. The performance of the DAE-ZIC is compared with two conventional methods, i.e., standard and rotated constellations. The proposed structure significantly enhances the performance of the ZIC. Simulation results confirm bit error rate reduction in all interference regimes (weak, moderate, and strong). At a signal-to-noise ratio of 20dB, the improvements reach about two orders of magnitude when only quantization error exists, indicating that the DAE-ZIC is highly robust to the interference compared to the conventional methods. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted in the IEEE ICC 2023. 6 pages, 5 figures

MSC Class: 94-10

arXiv:2303.02673 [pdf, other]

Time-frequency Network for Robust Speaker Recognition

Authors: Jiguo Li, Tianzi Zhang, Xiaobin Liu, Lirong Zheng

Abstract: The wide deployment of speech-based biometric systems usually demands high-performance speaker recognition algorithms. However, most of the prior works for speaker recognition either process the speech in the frequency domain or time domain, which may produce suboptimal results because both time and frequency domains are important for speaker recognition. In this paper, we attempt to analyze the s… ▽ More The wide deployment of speech-based biometric systems usually demands high-performance speaker recognition algorithms. However, most of the prior works for speaker recognition either process the speech in the frequency domain or time domain, which may produce suboptimal results because both time and frequency domains are important for speaker recognition. In this paper, we attempt to analyze the speech signal in both time and frequency domains and propose the time-frequency network~(TFN) for speaker recognition by extracting and fusing the features in the two domains. Based on the recent advance of deep neural networks, we propose a convolution neural network to encode the raw speech waveform and the frequency spectrum into domain-specific features, which are then fused and transformed into a classification feature space for speaker recognition. Experimental results on the publicly available datasets TIMIT and LibriSpeech show that our framework is effective to combine the information in the two domains and performs better than the state-of-the-art methods for speaker recognition. △ Less

Submitted 6 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: 5pages, 3 figures

arXiv:2301.10455 [pdf, other]

Rate-Perception Optimized Preprocessing for Video Coding

Authors: Chengqian Ma, Zhiqiang Wu, Chunlei Cai, Pengwei Zhang, Yi Wang, Long Zheng, Chao Chen, Quan Zhou

Abstract: In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss f… ▽ More In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss function which can save the bitrate and keep essential high frequency components as well. Furthermore, we also combine several state-of-the-art techniques from low-level vision fields into our approach, such as the high-order degradation model, efficient lightweight network design, and Image Quality Assessment model. By jointly using these powerful techniques, our RPP approach can achieve on average, 16.27% bitrate saving with different video encoders like AVC, HEVC, and VVC under multiple quality metrics. In the deployment stage, our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding. Each input frame only needs to make a single pass through RPP before sending into video encoders. In addition, in our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the codec to compress, while these videos with RPP save about 12% bitrate on average. Our RPP framework has been integrated into the production environment of our video transcoding services which serve millions of users every day. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2212.13053 [pdf, ps, other]

doi 10.1109/LRA.2021.3062805

Learning-based Predictive Path Following Control for Nonlinear Systems Under Uncertain Disturbances

Authors: Rui Yang, Lei Zheng, Jiesen Pan, Hui Cheng

Abstract: Accurate path following is challenging for autonomous robots operating in uncertain environments. Adaptive and predictive control strategies are crucial for a nonlinear robotic system to achieve high-performance path following control. In this paper, we propose a novel learning-based predictive control scheme that couples a high-level model predictive path following controller (MPFC) with a low-le… ▽ More Accurate path following is challenging for autonomous robots operating in uncertain environments. Adaptive and predictive control strategies are crucial for a nonlinear robotic system to achieve high-performance path following control. In this paper, we propose a novel learning-based predictive control scheme that couples a high-level model predictive path following controller (MPFC) with a low-level learning-based feedback linearization controller (LB-FBLC) for nonlinear systems under uncertain disturbances. The low-level LB-FBLC utilizes Gaussian Processes to learn the uncertain environmental disturbances online and tracks the reference state accurately with a probabilistic stability guarantee. Meanwhile, the high-level MPFC exploits the linearized system model augmented with a virtual linear path dynamics model to optimize the evolution of path reference targets, and provides the reference states and controls for the low-level LB-FBLC. Simulation results illustrate the effectiveness of the proposed control strategy on a quadrotor path following task under unknown wind disturbances. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 8 pages, 7 figures, accepted for publication in IEEE Robotics and Automation Letters ( Volume: 6, Issue: 2, April 2021)

arXiv:2211.10040 [pdf, ps, other]

DASECount: Domain-Agnostic Sample-Efficient Wireless Indoor Crowd Counting via Few-shot Learning

Authors: Huawei Hou, Suzhi Bi, Lili Zheng, Xiaohui Lin, Yuan Wu, Zhi Quan

Abstract: Accurate indoor crowd counting (ICC) is a key enabler to many smart home/office applications. In this paper, we propose a Domain-Agnostic and Sample-Efficient wireless indoor crowd Counting (DASECount) framework that suffices to attain robust cross-domain detection accuracy given very limited data samples in new domains. DASECount leverages the wisdom of few-shot learning (FSL) paradigm consisting… ▽ More Accurate indoor crowd counting (ICC) is a key enabler to many smart home/office applications. In this paper, we propose a Domain-Agnostic and Sample-Efficient wireless indoor crowd Counting (DASECount) framework that suffices to attain robust cross-domain detection accuracy given very limited data samples in new domains. DASECount leverages the wisdom of few-shot learning (FSL) paradigm consisting of two major stages: source domain meta training and target domain meta testing. Specifically, in the meta-training stage, we design and train two separate convolutional neural network (CNN) modules on the source domain dataset to fully capture the implicit amplitude and phase features of CSI measurements related to human activities. A subsequent knowledge distillation procedure is designed to iteratively update the CNN parameters for better generalization performance. In the meta-testing stage, we use the partial CNN modules to extract low-dimension features out of the high-dimension input target domain CSI data. With the obtained low-dimension CSI features, we can even use very few shots of target domain data samples (e.g., 5-shot samples) to train a lightweight logistic regression (LR) classifier, and attain very high cross-domain ICC accuracy. Experiment results show that the proposed DASECount method achieves over 92.68\%, and on average 96.37\% detection accuracy in a 0-8 people counting task under various domain setups, which significantly outperforms the other representative benchmark methods considered. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: 12 pages, 10 figures. The paper has been submitted for journal publication

arXiv:2210.06696 [pdf, other]

CPSAA: Accelerating Sparse Attention using Crossbar-based Processing-In-Memory Architecture

Authors: Huize Li, Hai **, Long Zheng, Yu Huang, Xiaofei Liao, Dan Chen, Zhuohui Duan, Cong Liu, Jiahong Xu, Chuanyi Gui

Abstract: The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse a… ▽ More The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer. △ Less

Submitted 7 October, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 14 pages, 19 figures

arXiv:2210.02214 [pdf, other]

URGLQ: An Efficient Covariance Matrix Reconstruction Method for Robust Adaptive Beamforming

Authors: Tao Luo, Peng Chen, Zhenxin Cao, Le Zheng, Zongxin Wang

Abstract: The computational complexity of the conventional adaptive beamformer is relatively large, and the performance degrades significantly due to the model mismatch errors and the unwanted signals in received data. In this paper, an efficient unwanted signal removal and Gauss-Legendre quadrature (URGLQ)-based covariance matrix reconstruction method is proposed. Different from the prior covariance matrix… ▽ More The computational complexity of the conventional adaptive beamformer is relatively large, and the performance degrades significantly due to the model mismatch errors and the unwanted signals in received data. In this paper, an efficient unwanted signal removal and Gauss-Legendre quadrature (URGLQ)-based covariance matrix reconstruction method is proposed. Different from the prior covariance matrix reconstruction methods, a projection matrix is constructed to remove the unwanted signal from the received data, which improves the reconstruction accuracy of the covariance matrix. Considering that the computational complexity of most matrix reconstruction algorithms is relatively large due to the integral operation, we proposed a Gauss-Legendre quadrature-based method to approximate the integral operation while maintaining accuracy. Moreover, to improve the robustness of the beamformer, the mismatch in the desired steering vector is corrected by maximizing the output power of the beamformer under a constraint that the corrected steering vector cannot converge to any interference steering vector. Simulation results and prototype experiments demonstrate that the performance of the proposed beamformer outperforms the compared methods and is much closer to the optimal beamformer in different scenarios. △ Less

Submitted 28 March, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 11 pages, 16 figures

arXiv:2210.00446 [pdf, other]

doi 10.1109/MSP.2023.3272881

Seventy Years of Radar and Communications: The Road from Separation to Integration

Authors: Fan Liu, Le Zheng, Yuanhao Cui, Christos Masouros, Athina P. Petropulu, Hugh Griffiths, Yonina C. Eldar

Abstract: Radar and communications (R&C) as key utilities of electromagnetic (EM) waves have fundamentally shaped human society and triggered the modern information age. Although R&C have been historically progressing separately, in recent decades they have been converging towards integration, forming integrated sensing and communication (ISAC) systems, giving rise to new, highly desirable capabilities in n… ▽ More Radar and communications (R&C) as key utilities of electromagnetic (EM) waves have fundamentally shaped human society and triggered the modern information age. Although R&C have been historically progressing separately, in recent decades they have been converging towards integration, forming integrated sensing and communication (ISAC) systems, giving rise to new, highly desirable capabilities in next-generation wireless networks and future radars. To better understand the essence of ISAC, this paper provides a systematic overview on the historical development of R&C from a signal processing (SP) perspective. We first interpret the duality between R&C as signals and systems, followed by an introduction of their fundamental principles. We then elaborate on the two main trends in their technological evolution, namely, the increase of frequencies and bandwidths, and the expansion of antenna arrays. We then show how the intertwined narratives of R&C evolved into ISAC, and discuss the resultant SP framework. Finally, we overview future research directions in this field. △ Less

Submitted 30 April, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: 14 pages, 8 figures, accepted by IEEE Signal Processing Magazine

arXiv:2208.09287 [pdf, other]

doi 10.1109/TCOMM.2023.3292468

Detect to Learn: Structure Learning with Attention and Decision Feedback for MIMO-OFDM Receive Processing

Authors: Jiarui Xu, Lianjun Li, Lizhong Zheng, Lingjia Liu

Abstract: The limited over-the-air (OTA) pilot symbols in multiple-input-multiple-output orthogonal-frequency-division-multiplexing (MIMO-OFDM) systems presents a major challenge for detecting transmitted data symbols at the receiver, especially for machine learning-based approaches. While it is crucial to explore effective ways to exploit pilots, one can also take advantage of the data symbols to improve d… ▽ More The limited over-the-air (OTA) pilot symbols in multiple-input-multiple-output orthogonal-frequency-division-multiplexing (MIMO-OFDM) systems presents a major challenge for detecting transmitted data symbols at the receiver, especially for machine learning-based approaches. While it is crucial to explore effective ways to exploit pilots, one can also take advantage of the data symbols to improve detection performance. Thus, this paper introduces an online attention-based approach, namely RC-AttStructNet-DF, that can efficiently utilize pilot symbols and be dynamically updated with the detected payload data using the decision feedback (DF) mechanism. Reservoir computing (RC) is employed in the time domain network to facilitate efficient online training. The frequency domain network adopts the novel 2D multi-head attention (MHA) module to capture the time and frequency correlations, and the structural-based StructNet to facilitate the DF mechanism. The attention loss is designed to learn the frequency domain network. The DF mechanism further enhances detection performance by dynamically tracking the channel changes through detected data symbols. The effectiveness of the RC-AttStructNet-DF approach is demonstrated through extensive experiments in MIMO-OFDM and massive MIMO-OFDM systems with different modulation orders and under various scenarios. △ Less

Submitted 9 October, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: Accepted to IEEE Transactions on Communications

arXiv:2206.15069 [pdf, other]

PVT-COV19D: Pyramid Vision Transformer for COVID-19 Diagnosis

Authors: Lilang Zheng, Jiaxuan Fang, Xiaorun Tang, Hanzhang Li, Jiaxin Fan, Tianyi Wang, Rui Zhou, Zhaoyan Yan

Abstract: With the outbreak of COVID-19, a large number of relevant studies have emerged in recent years. We propose an automatic COVID-19 diagnosis framework based on lung CT scan images, the PVT-COV19D. In order to accommodate the different dimensions of the image input, we first classified the images using Transformer models, then sampled the images in the dataset according to normal distribution, and fe… ▽ More With the outbreak of COVID-19, a large number of relevant studies have emerged in recent years. We propose an automatic COVID-19 diagnosis framework based on lung CT scan images, the PVT-COV19D. In order to accommodate the different dimensions of the image input, we first classified the images using Transformer models, then sampled the images in the dataset according to normal distribution, and fed the sampling results into the modified PVTv2 model for training. A large number of experiments on the COV19-CT-DB dataset demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: 8 pages,1 figure

arXiv:2110.02219 [pdf, other]

doi 10.1109/TWC.2022.3155945

RC-Struct: A Structure-based Neural Network Approach for MIMO-OFDM Detection

Authors: Jiarui Xu, Zhou Zhou, Lianjun Li, Lizhong Zheng, Lingjia Liu

Abstract: In this paper, we introduce a structure-based neural network architecture, namely RC-Struct, for MIMO-OFDM symbol detection. The RC-Struct exploits the temporal structure of the MIMO-OFDM signals through reservoir computing (RC). A binary classifier leverages the repetitive constellation structure in the system to perform multi-class detection. The incorporation of RC allows the RC-Struct to be le… ▽ More In this paper, we introduce a structure-based neural network architecture, namely RC-Struct, for MIMO-OFDM symbol detection. The RC-Struct exploits the temporal structure of the MIMO-OFDM signals through reservoir computing (RC). A binary classifier leverages the repetitive constellation structure in the system to perform multi-class detection. The incorporation of RC allows the RC-Struct to be learned in a purely online fashion with extremely limited pilot symbols in each OFDM subframe. The binary classifier enables the efficient utilization of the precious online training symbols and allows an easy extension to high-order modulations without a substantial increase in complexity. Experiments show that the introduced RC-Struct outperforms both the conventional model-based symbol detection approaches and the state-of-the-art learning-based strategies in terms of bit error rate (BER). The advantages of RC-Struct over existing methods become more significant when rank and link adaptation are adopted. The introduced RC-Struct sheds light on combining communication domain knowledge and learning-based receive processing for 5G/5G-Advanced and Beyond. △ Less

Submitted 17 August, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: 30 pages, 17 figures, journal submission IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications, vol. 21, no. 9, pp. 7181-7193, Sept. 2022

arXiv:2104.00239 [pdf, other]

Positive Sample Propagation along the Audio-Visual Event Line

Authors: **xing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, Meng Wang

Abstract: Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs). Given a video, we aim to localize video segments containing an AVE and identify its category. In order to learn discriminative features for a classifier, it is pivotal to identify the helpful (or positive) audio-visual segment pairs while filtering out the irrelevant ones, regardless whether they ar… ▽ More Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs). Given a video, we aim to localize video segments containing an AVE and identify its category. In order to learn discriminative features for a classifier, it is pivotal to identify the helpful (or positive) audio-visual segment pairs while filtering out the irrelevant ones, regardless whether they are synchronized or not. To this end, we propose a new positive sample propagation (PSP) module to discover and exploit the closely related audio-visual pairs by evaluating the relationship within every possible pair. It can be done by constructing an all-pair similarity map between each audio and visual segment, and only aggregating the features from the pairs with high similarity scores. To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed. We also propose a new weighting branch to better exploit the temporal correlations in weakly supervised setting. We perform extensive experiments on the public AVE dataset and achieve new state-of-the-art accuracy in both fully and weakly supervised settings, thus verifying the effectiveness of our method. △ Less

Submitted 5 April, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

Comments: Accepted to CVPR 2021. Code is available at https://github.com/jasongief/PSP_CVPR_2021

arXiv:2102.12780 [pdf, other]

doi 10.1109/JSTSP.2021.3113120

An Overview of Signal Processing Techniques for Joint Communication and Radar Sensing

Authors: J. Andrew Zhang, Fan Liu, Christos Masouros, Robert W. Heath Jr., Zhiyong Feng, Le Zheng, Athina Petropulu

Abstract: Joint communication and radar sensing (JCR) represents an emerging research field aiming to integrate the above two functionalities into a single system, sharing a majority of hardware and signal processing modules and, in a typical case, sharing a single transmitted signal. It is recognised as a key approach in significantly improving spectrum efficiency, reducing device size, cost and power cons… ▽ More Joint communication and radar sensing (JCR) represents an emerging research field aiming to integrate the above two functionalities into a single system, sharing a majority of hardware and signal processing modules and, in a typical case, sharing a single transmitted signal. It is recognised as a key approach in significantly improving spectrum efficiency, reducing device size, cost and power consumption, and improving performance thanks to potential close cooperation of the two functions. Advanced signal processing techniques are critical for making the integration efficient, from transmission signal design to receiver processing. This paper provides a comprehensive overview of JCR systems from the signal processing perspective, with a focus on state-of-the-art. A balanced coverage on both transmitter and receiver is provided for three types of JCR systems, communication-centric, radar-centric, and joint design and optimization. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 18 pages, 3 figures, journal

Journal ref: IEEE Journal of Selected Topics in Signal Processing, 2021

arXiv:2012.14036 [pdf, other]

Aerial Imagery Pile burn detection using Deep Learning: the FLAME dataset

Authors: Alireza Shamsoshoara, Fatemeh Afghah, Abolfazl Razi, Liming Zheng, Peter Z Fulé, Erik Blasch

Abstract: Wildfires are one of the costliest and deadliest natural disasters in the US, causing damage to millions of hectares of forest resources and threatening the lives of people and animals. Of particular importance are risks to firefighters and operational forces, which highlights the need for leveraging technology to minimize danger to people and property. FLAME (Fire Luminosity Airborne-based Machin… ▽ More Wildfires are one of the costliest and deadliest natural disasters in the US, causing damage to millions of hectares of forest resources and threatening the lives of people and animals. Of particular importance are risks to firefighters and operational forces, which highlights the need for leveraging technology to minimize danger to people and property. FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) offers a dataset of aerial images of fires along with methods for fire detection and segmentation which can help firefighters and researchers to develop optimal fire management strategies. This paper provides a fire image dataset collected by drones during a prescribed burning piled detritus in an Arizona pine forest. The dataset includes video recordings and thermal heatmaps captured by infrared cameras. The captured videos and images are annotated and labeled frame-wise to help researchers easily apply their fire detection and modeling algorithms. The paper also highlights solutions to two machine learning problems: (1) Binary classification of video frames based on the presence [and absence] of fire flames. An Artificial Neural Network (ANN) method is developed that achieved a 76% classification accuracy. (2) Fire detection using segmentation methods to precisely determine fire borders. A deep learning method is designed based on the U-Net up-sampling and down-sampling approach to extract a fire mask from the video frames. Our FLAME method approached a precision of 92% and a recall of 84%. Future research will expand the technique for free burning broadcast fire using thermal images. △ Less

Submitted 27 December, 2020; originally announced December 2020.

Comments: 27 Pages, 7 Figures, 4 Tables

arXiv:2012.07937 [pdf, other]

Template Matching with Ranks

Authors: Ery Arias-Castro, Lin Zheng

Abstract: We consider the problem of matching a template to a noisy signal. Motivated by some recent proposals in the signal processing literature, we suggest a rank-based method and study its asymptotic properties using some well-established techniques in empirical process theory combined with Hájek's projection method. The resulting estimator of the shift is shown to achieve a parametric rate of convergen… ▽ More We consider the problem of matching a template to a noisy signal. Motivated by some recent proposals in the signal processing literature, we suggest a rank-based method and study its asymptotic properties using some well-established techniques in empirical process theory combined with Hájek's projection method. The resulting estimator of the shift is shown to achieve a parametric rate of convergence and to be asymptotically normal. Some numerical simulations corroborate these findings. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2012.00711 [pdf, other]

Learning with Knowledge of Structure: A Neural Network-Based Approach for MIMO-OFDM Detection

Authors: Zhou Zhou, Shashank Jere, Lizhong Zheng, Lingjia Liu

Abstract: In this paper, we explore neural network-based strategies for performing symbol detection in a MIMO-OFDM system. Building on a reservoir computing (RC)-based approach towards symbol detection, we introduce a symmetric and decomposed binary decision neural network to take advantage of the structure knowledge inherent in the MIMO-OFDM system. To be specific, the binary decision neural network is add… ▽ More In this paper, we explore neural network-based strategies for performing symbol detection in a MIMO-OFDM system. Building on a reservoir computing (RC)-based approach towards symbol detection, we introduce a symmetric and decomposed binary decision neural network to take advantage of the structure knowledge inherent in the MIMO-OFDM system. To be specific, the binary decision neural network is added in the frequency domain utilizing the knowledge of the constellation. We show that the introduced symmetric neural network can decompose the original $M$-ary detection problem into a series of binary classification tasks, thus significantly reducing the neural network detector complexity while offering good generalization performance with limited training overhead. Numerical evaluations demonstrate that the introduced hybrid RC-binary decision detection framework performs close to maximum likelihood model-based symbol detection methods in terms of symbol error rate in the low SNR regime with imperfect channel state information (CSI). △ Less

Submitted 2 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

Journal ref: 2020 IEEE Asilomar Conference on Signals, Systems, and Computers

arXiv:2009.06863 [pdf]

When Automatic Voice Disguise Meets Automatic Speaker Verification

Authors: Linlin Zheng, Jiakang Li, Meng Sun, Xiongwei Zhang, Thomas Fang Zheng

Abstract: The technique of transforming voices in order to hide the real identity of a speaker is called voice disguise, among which automatic voice disguise (AVD) by modifying the spectral and temporal characteristics of voices with miscellaneous algorithms are easily conducted with softwares accessible to the public. AVD has posed great threat to both human listening and automatic speaker verification (AS… ▽ More The technique of transforming voices in order to hide the real identity of a speaker is called voice disguise, among which automatic voice disguise (AVD) by modifying the spectral and temporal characteristics of voices with miscellaneous algorithms are easily conducted with softwares accessible to the public. AVD has posed great threat to both human listening and automatic speaker verification (ASV). In this paper, we have found that ASV is not only a victim of AVD but could be a tool to beat some simple types of AVD. Firstly, three types of AVD, pitch scaling, vocal tract length normalization (VTLN) and voice conversion (VC), are introduced as representative methods. State-of-the-art ASV methods are subsequently utilized to objectively evaluate the impact of AVD on ASV by equal error rates (EER). Moreover, an approach to restore disguised voice to its original version is proposed by minimizing a function of ASV scores w.r.t. restoration parameters. Experiments are then conducted on disguised voices from Voxceleb, a dataset recorded in real-world noisy scenario. The results have shown that, for the voice disguise by pitch scaling, the proposed approach obtains an EER around 7% comparing to the 30% EER of a recently proposed baseline using the ratio of fundamental frequencies. The proposed approach generalizes well to restore the disguise with nonlinear frequency war** in VTLN by reducing its EER from 34.3% to 18.5%. However, it is difficult to restore the source speakers in VC by our approach, where more complex forms of restoration functions or other paralinguistic cues might be necessary to restore the nonlinear transform in VC. Finally, contrastive visualization on ASV features with and without restoration illustrate the role of the proposed approach in an intuitive way. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: accepted for publication

Journal ref: IEEE Transactions on Information Forensics and Security, 2020

arXiv:2009.04072 [pdf, other]

Template Matching and Change Point Detection by M-estimation

Authors: Ery Arias-Castro, Lin Zheng

Abstract: We consider the fundamental problem of matching a template to a signal. We do so by M-estimation, which encompasses procedures that are robust to gross errors (i.e., outliers). Using standard results from empirical process theory, we derive the convergence rate and the asymptotic distribution of the M-estimator under relatively mild assumptions. We also discuss the optimality of the estimator, bot… ▽ More We consider the fundamental problem of matching a template to a signal. We do so by M-estimation, which encompasses procedures that are robust to gross errors (i.e., outliers). Using standard results from empirical process theory, we derive the convergence rate and the asymptotic distribution of the M-estimator under relatively mild assumptions. We also discuss the optimality of the estimator, both in finite samples in the minimax sense and in the large-sample limit in terms of local minimaxity and relative efficiency. Although most of the paper is dedicated to the study of the basic shift model in the context of a random design, we consider many extensions towards the end of the paper, including more flexible templates, fixed designs, the agnostic setting, and more. △ Less

Submitted 8 September, 2020; originally announced September 2020.

arXiv:2008.07514 [pdf, other]

Source Free Domain Adaptation with Image Translation

Authors: Yunzhong Hou, Liang Zheng

Abstract: Effort in releasing large-scale datasets may be compromised by privacy and intellectual property considerations. A feasible alternative is to release pre-trained models instead. While these models are strong on their original task (source domain), their performance might degrade significantly when deployed directly in a new environment (target domain), which might not contain labels for training u… ▽ More Effort in releasing large-scale datasets may be compromised by privacy and intellectual property considerations. A feasible alternative is to release pre-trained models instead. While these models are strong on their original task (source domain), their performance might degrade significantly when deployed directly in a new environment (target domain), which might not contain labels for training under realistic settings. Domain adaptation (DA) is a known solution to the domain gap problem, but usually requires labeled source data. In this paper, we study the problem of source free domain adaptation (SFDA), whose distinctive feature is that the source domain only provides a pre-trained model, but no source data. Being source free adds significant challenges to DA, especially when considering that the target dataset is unlabeled. To solve the SFDA problem, we propose an image translation approach that transfers the style of target images to that of unseen source images. To this end, we align the batch-wise feature statistics of generated images to that stored in batch normalization layers of the pre-trained model. Compared with directly classifying target images, higher accuracy is obtained with these style transferred images using the pre-trained model. On several image classification datasets, we show that the above-mentioned improvements are consistent and statistically significant. △ Less

Submitted 16 May, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

arXiv:2003.09488 [pdf, other]

Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

Authors: Liyuan Zheng, Yuanyuan Shi, Lillian J. Ratliff, Baosen Zhang

Abstract: This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding… ▽ More This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms vanilla reinforcement learning in a variety of benchmark control tasks. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:1911.00635 [pdf, other]

Automatic Calibration of Dual-LiDARs Using Two Poles Stickered with Retro-Reflective Tape

Authors: Bohuan Xue, Jianhao Jiao, Yilong Zhu, Linwei Zheng, Dong Han, Ming Liu, Rui Fan

Abstract: Multi-LiDAR systems have been prevalently applied in modern autonomous vehicles to render a broad view of the environments. The rapid development of 5G wireless technologies has brought a breakthrough for current cellular vehicle-to-everything (C-V2X) applications. Therefore, a novel localization and perception system in which multiple LiDARs are mounted around cities for autonomous vehicles has b… ▽ More Multi-LiDAR systems have been prevalently applied in modern autonomous vehicles to render a broad view of the environments. The rapid development of 5G wireless technologies has brought a breakthrough for current cellular vehicle-to-everything (C-V2X) applications. Therefore, a novel localization and perception system in which multiple LiDARs are mounted around cities for autonomous vehicles has been proposed. However, the existing calibration methods require specific hard-to-move markers, ego-motion, or good initial values given by users. In this paper, we present a novel approach that enables automatic multi-LiDAR calibration using two poles stickered with retro-reflective tape. This method does not depend on prior environmental information, initial values of the extrinsic parameters, or movable platforms like a car. We analyze the LiDAR-pole model, verify the feasibility of the algorithm through simulation data, and present a simple method to measure the calibration errors w.r.t the ground truth. Experimental results demonstrate that our approach gains better flexibility and higher accuracy when compared with the state-of-the-art approach. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 6 pages, 7 figures, 2019 IEEE Conference on Imaging Systems and Techniques (IST)

arXiv:1909.12901 [pdf, other]

doi 10.1007/978-3-030-46640-4_13

3D U-Net Based Brain Tumor Segmentation and Survival Days Prediction

Authors: Feifan Wang, Runzhou Jiang, Liqin Zheng, Chun Meng, Bharat Biswal

Abstract: Past few years have witnessed the prevalence of deep learning in many application scenarios, among which is medical image processing. Diagnosis and treatment of brain tumors requires an accurate and reliable segmentation of brain tumors as a prerequisite. However, such work conventionally requires brain surgeons significant amount of time. Computer vision techniques could provide surgeons a relief… ▽ More Past few years have witnessed the prevalence of deep learning in many application scenarios, among which is medical image processing. Diagnosis and treatment of brain tumors requires an accurate and reliable segmentation of brain tumors as a prerequisite. However, such work conventionally requires brain surgeons significant amount of time. Computer vision techniques could provide surgeons a relief from the tedious marking procedure. In this paper, a 3D U-net based deep learning model has been trained with the help of brain-wise normalization and patching strategies for the brain tumor segmentation task in the BraTS 2019 competition. Dice coefficients for enhancing tumor, tumor core, and the whole tumor are 0.737, 0.807 and 0.894 respectively on the validation dataset. These three values on the test dataset are 0.778, 0.798 and 0.852. Furthermore, numerical features including ratio of tumor size to brain size and the area of tumor surface as well as age of subjects are extracted from predicted tumor labels and have been used for the overall survival days prediction task. The accuracy could be 0.448 on the validation dataset, and 0.551 on the final test dataset. △ Less

Submitted 31 March, 2020; v1 submitted 15 September, 2019; originally announced September 2019.

Comments: Third place award of the 2019 MICCAI BraTS challenge survival task [BraTS 2019](https://www.med.upenn.edu/cbica/brats2019.html)

MSC Class: 68T45

arXiv:1902.08676 [pdf, other]

Radar and Communication Co-existence: an Overview

Authors: Le Zheng, Marco Lops, Yonina C. Eldar, Xiaodong Wang

Abstract: Increased amounts of bandwidth are required to guarantee both high-quality/high-rate wireless services (4G and 5G) and reliable sensing capabilities such as automotive radar, air traffic control, earth geophysical monitoring and security applications. Therefore, co-existence between radar and communication systems using overlap** bandwidths has been a primary investigation field in recent years.… ▽ More Increased amounts of bandwidth are required to guarantee both high-quality/high-rate wireless services (4G and 5G) and reliable sensing capabilities such as automotive radar, air traffic control, earth geophysical monitoring and security applications. Therefore, co-existence between radar and communication systems using overlap** bandwidths has been a primary investigation field in recent years. Various signal processing techniques such as interference mitigation, pre-coding or spatial separation, and waveform design allow both radar and communications to share the spectrum. This article reviews recent work on co-existence between radar and communication systems, including signal models, waveform design and signal processing techniques. Our goal is to survey contributions in this area in order to provide a primary starting point for new researchers interested in these problems. △ Less

Submitted 22 February, 2019; originally announced February 2019.

Comments: Submitted to IEEE Signal Processing Magazine

arXiv:1902.03436 [pdf, other]

doi 10.1109/TWC.2019.2929772

Interference Removal for Radar/Communication Co-existence: the Random Scattering Case

Authors: Yinchuan Li, Le Zheng, Marco Lops, Xiaodong Wang

Abstract: In this paper we consider an un-cooperative spectrum sharing scenario, wherein a radar system is to be overlaid to a pre-existing wireless communication system. Given the order of magnitude of the transmitted powers in play, we focus on the issue of interference mitigation at the communication receiver. We explicitly account for the reverberation produced by the (typically high-power) radar transm… ▽ More In this paper we consider an un-cooperative spectrum sharing scenario, wherein a radar system is to be overlaid to a pre-existing wireless communication system. Given the order of magnitude of the transmitted powers in play, we focus on the issue of interference mitigation at the communication receiver. We explicitly account for the reverberation produced by the (typically high-power) radar transmitter whose signal hits scattering centers (whether targets or clutter) producing interference onto the communication receiver, which is assumed to operate in an un-synchronized and un-coordinated scenario. We first show that receiver design amounts to solving a non-convex problem of joint interference removal and data demodulation: next, we introduce two algorithms, both exploiting sparsity of a proper representation of the interference and of the vector containing the errors of the data block. The first algorithm is basically a relaxed constrained Atomic Norm minimization, while the latter relies on a two-stage processing structure and is based on alternating minimization. The merits of these algorithms are demonstrated through extensive simulations: interestingly, the two-stage alternating minimization algorithm turns out to achieve satisfactory performance with moderate computational complexity. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Showing 1–50 of 55 results for author: Zheng, L