Search | arXiv e-print repository

A Review of Safe Reinforcement Learning Methods for Modern Power Systems

Authors: Tong Su, Tong Wu, Junbo Zhao, Anna Scaglione, Le Xie

Abstract: Due to the availability of more comprehensive measurement data in modern power systems, there has been significant interest in develo** and applying reinforcement learning (RL) methods for operation and control. Conventional RL training is based on trial-and-error and reward feedback interaction with either a model-based simulated environment or a data-driven and model-free simulation environmen… ▽ More Due to the availability of more comprehensive measurement data in modern power systems, there has been significant interest in develo** and applying reinforcement learning (RL) methods for operation and control. Conventional RL training is based on trial-and-error and reward feedback interaction with either a model-based simulated environment or a data-driven and model-free simulation environment. These methods often lead to the exploration of actions in unsafe regions of operation and, after training, the execution of unsafe actions when the RL policies are deployed in real power systems. A large body of literature has proposed safe RL strategies to prevent unsafe training policies. In power systems, safe RL represents a class of RL algorithms that can ensure or promote the safety of power system operations by executing safe actions while optimizing the objective function. While different papers handle the safety constraints differently, the overarching goal of safe RL methods is to determine how to train policies to satisfy safety constraints while maximizing rewards. This paper provides a comprehensive review of safe RL techniques and their applications in different power system operations and control, including optimal power generation dispatch, voltage control, stability control, electric vehicle (EV) charging control, buildings' energy management, electricity market, system restoration, and unit commitment and reserve scheduling. Additionally, the paper discusses benchmarks, challenges, and future directions for safe RL research in power systems. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.16990 [pdf, other]

AND: Audio Network Dissection for Interpreting Deep Acoustic Models

Authors: Tung-Yu Wu, Yu-Xiang Lin, Tsui-Wei Weng

Abstract: Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework th… ▽ More Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework that automatically establishes natural language explanations of acoustic neurons based on highly-responsive audio. $\textit{AND}$ features the use of LLMs to summarize mutual acoustic features and identities among audio. Extensive experiments are conducted to verify $\textit{AND}$'s precise and informative descriptions. In addition, we demonstrate a potential use of $\textit{AND}$ for audio machine unlearning by conducting concept-specific pruning based on the generated descriptions. Finally, we highlight two acoustic model behaviors with analysis by $\textit{AND}$: (i) models discriminate audio with a combination of basic acoustic features rather than high-level abstract concepts; (ii) training strategies affect model behaviors and neuron interpretability -- supervised training guides neurons to gradually narrow their attention, while self-supervised learning encourages neurons to be polysemantic for exploring high-level features. △ Less

Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by ICML'24

arXiv:2406.16876 [pdf, other]

Near-Field Mobile Tracking: A Framework of Using XL-RIS Information

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Chau Yuen

Abstract: This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more… ▽ More This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more accurate tracking of mobile users (MUs). As the first step, we present an XL-RIS information reconstruction (XL-RIS-IR) algorithm to reconstruct the high-dimensional XL-RIS information from the low-dimensional BS information. Building on this, this paper proposes a comprehensive framework for mobile tracking, consisting of a Feature Extraction Module and a Mobile Tracking Module. The Feature Extraction Module incorporates a convolutional neural network (CNN) extractor for spatial features, a time and frequency (T$\&$F) extractor for domain features, and a near-field angles of arrival (AoAs) extractor for capturing AoA features within the XL-RIS. These features are combined into a comprehensive feature vector, forming a time-varying sequence fed into the Mobile Tracking Module, which employs an Auto-encoder (AE) with a stacked bidirectional long short-term memory (Bi-LSTM) encoder and a standard LSTM decoder to predict MUs' positions in the upcoming time slot. Simulation results confirm that the tracking accuracy of our proposed framework is significantly enhanced by using reconstructed XL-RIS information and exhibits substantial robustness to signal-to-noise ratio (SNR) variations. △ Less

Submitted 3 April, 2024; originally announced June 2024.

arXiv:2405.11155 [pdf, other]

Inner-approximate Reachability Computation via Zonotopic Boundary Analysis

Authors: De** Ren, Zhen Liang, Chenyu Wu, Jianqiang Ding, Taoran Wu, Bai Xue

Abstract: Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability… ▽ More Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability analysis problem based on the set-boundary reachability method for systems modelled by ordinary differential equations, in which the computed inner-approximations are represented with zonotopes. The set-boundary reachability method computes an inner-approximation by excluding states reached from the initial set's boundary. The effectiveness of this method is highly dependent on the efficient extraction of the exact boundary of the initial set. To address this, we propose methods leveraging boundary and tiling matrices that can efficiently extract and refine the exact boundary of the initial set represented by zonotopes. Additionally, we enhance the exclusion strategy by contracting the outer-approximations in a flexible way, which allows for the computation of less conservative inner-approximations. To evaluate the proposed method, we compare it with state-of-the-art methods against a series of benchmarks. The numerical results demonstrate that our method is not only efficient but also accurate in computing inner-approximations. △ Less

Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: the extended version of the paper accepted by CAV 2024

arXiv:2404.13289 [pdf, other]

Double Mixture: Towards Continual Event Detection from Speech

Authors: **gqi Kang, Tongtong Wu, **ming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Abstract: Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of… ▽ More Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events. We introduce a new task, continual event detection from speech, for which we also provide two benchmark datasets. To address the challenges of catastrophic forgetting and effective disentanglement, we propose a novel method, 'Double Mixture.' This method merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting. Our comprehensive experiments show that this task presents significant challenges that are not effectively addressed by current state-of-the-art methods in either computer vision or natural language processing. Our approach achieves the lowest rates of forgetting and the highest levels of generalization, proving robust across various continual learning sequences. Our code and data are available at https://anonymous.4open.science/status/Continual-SpeechED-6461. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.09007 [pdf, ps, other]

A Framework for Safe Probabilistic Invariance Verification of Stochastic Dynamical Systems

Authors: Taoran Wu, Yiqing Yu, Bican Xia, Ji Wang, Bai Xue

Abstract: Ensuring safety through set invariance has proven to be a valuable method in various robotics and control applications. This paper introduces a comprehensive framework for the safe probabilistic invariance verification of both discrete- and continuous-time stochastic dynamical systems over an infinite time horizon. The objective is to ascertain the lower and upper bounds of the liveness probabilit… ▽ More Ensuring safety through set invariance has proven to be a valuable method in various robotics and control applications. This paper introduces a comprehensive framework for the safe probabilistic invariance verification of both discrete- and continuous-time stochastic dynamical systems over an infinite time horizon. The objective is to ascertain the lower and upper bounds of the liveness probability for a given safe set and set of initial states. This probability signifies the likelihood of the system remaining within the safe set indefinitely, starting from the set of initial states. To address this problem, we propose optimizations for verifying safe probabilistic invariance in discrete-time and continuous-time stochastic dynamical systems. These optimizations adapt classical stochastic barrier certificates, which are based on Doob's non-negative supermartingale inequality, and the equations described in [29],[31], which can precisely define the probability of reaching a target set while avoiding unsafe states. Finally, we demonstrate the effectiveness of these optimizations through several examples using semi-definite programming tools. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07575 [pdf]

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Abstract: Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distri… ▽ More Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss reweighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy. △ Less

Submitted 11 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2403.16529 [pdf, other]

Exploit High-Dimensional RIS Information to Localization: What Is the Impact of Faulty Element?

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Cheng-Xiang Wang, Robert Schober, Xiaohu You

Abstract: This paper proposes a novel localization algorithm using the reconfigurable intelligent surface (RIS) received signal, i.e., RIS information. Compared with BS received signal, i.e., BS information, RIS information offers higher dimension and richer feature set, thereby providing an enhanced capacity to distinguish positions of the mobile users (MUs). Additionally, we address a practical scenario w… ▽ More This paper proposes a novel localization algorithm using the reconfigurable intelligent surface (RIS) received signal, i.e., RIS information. Compared with BS received signal, i.e., BS information, RIS information offers higher dimension and richer feature set, thereby providing an enhanced capacity to distinguish positions of the mobile users (MUs). Additionally, we address a practical scenario where RIS contains some unknown (number and places) faulty elements that cannot receive signals. Initially, we employ transfer learning to design a two-phase transfer learning (TPTL) algorithm, designed for accurate detection of faulty elements. Then our objective is to regain the information lost from the faulty elements and reconstruct the complete high-dimensional RIS information for localization. To this end, we propose a transfer-enhanced dual-stage (TEDS) algorithm. In \emph{Stage I}, we integrate the CNN and variational autoencoder (VAE) to obtain the RIS information, which in \emph{Stage II}, is input to the transferred DenseNet 121 to estimate the location of the MU. To gain more insight, we propose an alternative algorithm named transfer-enhanced direct fingerprint (TEDF) algorithm which only requires the BS information. The comparison between TEDS and TEDF reveals the effectiveness of faulty element detection and the benefits of utilizing the high-dimensional RIS information for localization. Besides, our empirical results demonstrate that the performance of the localization algorithm is dominated by the high-dimensional RIS information and is robust to unoptimized phase shifts and signal-to-noise ratio (SNR). △ Less

Submitted 28 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 17 pages, Accepted by IEEE JSAC

arXiv:2403.16521 [pdf, other]

Employing High-Dimensional RIS Information for RIS-aided Localization Systems

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Jiangzhou Wang, Chau Yuen

Abstract: Reconfigurable intelligent surface (RIS)-aided localization systems have attracted extensive research attention due to their accuracy enhancement capabilities. However, most studies primarily utilized the base stations (BS) received signal, i.e., BS information, for localization algorithm design, neglecting the potential of RIS received signal, i.e., RIS information. Compared with BS information,… ▽ More Reconfigurable intelligent surface (RIS)-aided localization systems have attracted extensive research attention due to their accuracy enhancement capabilities. However, most studies primarily utilized the base stations (BS) received signal, i.e., BS information, for localization algorithm design, neglecting the potential of RIS received signal, i.e., RIS information. Compared with BS information, RIS information offers higher dimension and richer feature set, thereby significantly improving the ability to extract positions of the mobile users (MUs). Addressing this oversight, this paper explores the algorithm design based on the high-dimensional RIS information. Specifically, we first propose a RIS information reconstruction (RIS-IR) algorithm to reconstruct the high-dimensional RIS information from the low-dimensional BS information. The proposed RIS-IR algorithm comprises a data processing module for preprocessing BS information, a convolution neural network (CNN) module for feature extraction, and an output module for outputting the reconstructed RIS information. Then, we propose a transfer learning based fingerprint (TFBF) algorithm that employs the reconstructed high-dimensional RIS information for MU localization. This involves adapting a pre-trained DenseNet-121 model to map the reconstructed RIS signal to the MU's three-dimensional (3D) position. Empirical results affirm that the localization performance is significantly influenced by the high-dimensional RIS information and maintains robustness against unoptimized phase shifts. △ Less

Submitted 16 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.10522 [pdf, other]

doi 10.1109/WACV57701.2024.00770

Ordinal Classification with Distance Regularization for Robust Brain Age Prediction

Authors: Jay Shah, Md Mahfuzur Rahman Siddiquee, Yi Su, Teresa Wu, Baoxin Li

Abstract: Age is one of the major known risk factors for Alzheimer's Disease (AD). Detecting AD early is crucial for effective treatment and preventing irreversible brain damage. Brain age, a measure derived from brain imaging reflecting structural changes due to aging, may have the potential to identify AD onset, assess disease risk, and plan targeted interventions. Deep learning-based regression technique… ▽ More Age is one of the major known risk factors for Alzheimer's Disease (AD). Detecting AD early is crucial for effective treatment and preventing irreversible brain damage. Brain age, a measure derived from brain imaging reflecting structural changes due to aging, may have the potential to identify AD onset, assess disease risk, and plan targeted interventions. Deep learning-based regression techniques to predict brain age from magnetic resonance imaging (MRI) scans have shown great accuracy recently. However, these methods are subject to an inherent regression to the mean effect, which causes a systematic bias resulting in an overestimation of brain age in young subjects and underestimation in old subjects. This weakens the reliability of predicted brain age as a valid biomarker for downstream clinical applications. Here, we reformulate the brain age prediction task from regression to classification to address the issue of systematic bias. Recognizing the importance of preserving ordinal information from ages to understand aging trajectory and monitor aging longitudinally, we propose a novel ORdinal Distance Encoded Regularization (ORDER) loss that incorporates the order of age labels, enhancing the model's ability to capture age-related patterns. Extensive experiments and ablation studies demonstrate that this framework reduces systematic bias, outperforms state-of-art methods by statistically significant margins, and can better capture subtle differences between clinical groups in an independent AD dataset. Our implementation is publicly available at https://github.com/jaygshah/Robust-Brain-Age-Prediction. △ Less

Submitted 6 May, 2024; v1 submitted 25 October, 2023; originally announced March 2024.

Comments: Accepted in WACV 2024

arXiv:2403.10323 [pdf, ps, other]

Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks

Authors: Junteng Yao, Tuo Wu, Ming **, Cunhua Pan, Quanzhong Li, **hong Yuan

Abstract: This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq… ▽ More This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-square-error (MSE) of the AP, while considering transmit power constraints at both the AP and the sensors, as well as ensuring the covert transmission to Willie with a low detection error probability (DEP). However, obtaining globally optimal solutions for the investigated non-convex problem is challenging due to the interdependence of optimization variables. To tackle this problem, we introduce an exact penalty algorithm and transform the optimization problem into a difference-of-convex (DC) form problem to find a locally optimal solution. Simulation results showcase the superior performance in terms of our proposed scheme in comparison to the benchmark schemes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.00453 [pdf, ps, other]

Exploring Fairness for FAS-assisted Communication Systems: from NOMA to OMA

Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming **, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

Abstract: This paper addresses the fairness issue within fluid antenna system (FAS)-assisted non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) systems, where a single fixed-antenna base station (BS) transmits superposition-coded signals to two users, each with a single fluid antenna. We define fairness through the minimization of the maximum outage probability for the two users, und… ▽ More This paper addresses the fairness issue within fluid antenna system (FAS)-assisted non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) systems, where a single fixed-antenna base station (BS) transmits superposition-coded signals to two users, each with a single fluid antenna. We define fairness through the minimization of the maximum outage probability for the two users, under total resource constraints for both FAS-assisted NOMA and OMA systems. Specifically, in the FAS-assisted NOMA systems, we study both a special case and the general case, deriving a closed-form solution for the former and applying a bisection search method to find the optimal solution for the latter. Moreover, for the general case, we derive a locally optimal closed-form solution to achieve fairness. In the FAS-assisted OMA systems, to deal with the non-convex optimization problem with coupling of the variables in the objective function, we employ an approximation strategy to facilitate a successive convex approximation (SCA)-based algorithm, achieving locally optimal solutions for both cases. Empirical analysis validates that our proposed solutions outperform conventional NOMA and OMA benchmarks in terms of fairness. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.17167 [pdf, ps, other]

Converse Barrier Certificates for Finite-time Safety Verification of Continuous-time Perturbed Deterministic Systems

Authors: Yonghan Li, Chenyu Wu, Taoran Wu, Shijie Wang, Bai Xue

Abstract: In this paper, we investigate the problem of verifying the finite-time safety of continuous-time perturbed deterministic systems represented by ordinary differential equations in the presence of measurable disturbances. Given a finite time horizon, if the system is safe, it, starting from a compact initial set, will remain within an open and bounded safe region throughout the specified time horizo… ▽ More In this paper, we investigate the problem of verifying the finite-time safety of continuous-time perturbed deterministic systems represented by ordinary differential equations in the presence of measurable disturbances. Given a finite time horizon, if the system is safe, it, starting from a compact initial set, will remain within an open and bounded safe region throughout the specified time horizon, regardless of the disturbances. The main contribution of this work is to uncover that there exists a time-dependent barrier certificate if and only if the system is safe. This barrier certificate satisfies the following conditions: negativity over the initial set at the initial time instant, non-negativity over the boundary of the safe set, and non-increasing behavior along the system dynamics over the specified finite time horizon. The existence problem is explored using a Hamilton-Jacobi differential equation, which has a unique Lipschitz viscosity solution. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.02159 [pdf, ps, other]

FAS-assisted Wireless Powered Communication Systems

Authors: Xiazhi Lai, Kangda Zhi, Wanyi Li, Tuo Wu, Cunhua Pan, Maged Elkashlan

Abstract: Fluid Antenna System (FAS) is recognized as a promising technology for enhancing communication performance. In this context, we explored the potential of FAS-assisted wireless powered communication systems. Specifically, the transmitter, equipped with FAS, harvests the radio frequency (RF) signal from a power beacon and utilizes the harvested energy for data transmission to the receiver. To evalua… ▽ More Fluid Antenna System (FAS) is recognized as a promising technology for enhancing communication performance. In this context, we explored the potential of FAS-assisted wireless powered communication systems. Specifically, the transmitter, equipped with FAS, harvests the radio frequency (RF) signal from a power beacon and utilizes the harvested energy for data transmission to the receiver. To evaluate the performance of the considered systems, we derive both the analytical and asymptotic expressions of the outage probability. Simulation results indicate that the diversity of the considered network closely aligns with the number of ports. Besides, it is also revealed that the port selection criteria based solely on single-hop configurations yield a diversity order of only one. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.02122 [pdf, other]

Secure Wireless Communication in Active RIS-Assisted DFRC System

Authors: Yang Zhang, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Tuo Wu, Yongchao He

Abstract: This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-r… ▽ More This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-ratio (SINR) constraint of the radar echo and the power consumption constraints at the DFRC-BS and active RIS. An alternating optimization (AO) algorithm based on semi-definite relaxation (SDR) and majorizationminimization (MM) is applied to solve the SR-maximization problem by alternately optimizing the beamforming matrix and the reflecting coefficients. Specifically, we first apply the SDR and successive convex approximation (SCA) methods to transform the two subproblems into more tractable forms, then the MM method is applied to derive a concave surrogate function and iteratively solve the subproblems. Finally, simulation results indicate that the active RIS can better confront the impact of "multiplicative fading" and outperforms traditional passive RIS in terms of both secure data rate and radar sensing performance. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures

arXiv:2312.17583 [pdf, other]

Enhancing the Performance of DeepReach on High-Dimensional Systems through Optimizing Activation Functions

Authors: Qian Wang, Tianhao Wu

Abstract: With the continuous advancement in autonomous systems, it becomes crucial to provide robust safety guarantees for safety-critical systems. Hamilton-Jacobi Reachability Analysis is a formal verification method that guarantees performance and safety for dynamical systems and is widely applicable to various tasks and challenges. Traditionally, reachability problems are solved by using grid-based meth… ▽ More With the continuous advancement in autonomous systems, it becomes crucial to provide robust safety guarantees for safety-critical systems. Hamilton-Jacobi Reachability Analysis is a formal verification method that guarantees performance and safety for dynamical systems and is widely applicable to various tasks and challenges. Traditionally, reachability problems are solved by using grid-based methods, whose computational and memory cost scales exponentially with the dimensionality of the system. To overcome this challenge, DeepReach, a deep learning-based approach that approximately solves high-dimensional reachability problems, is proposed and has shown lots of promise. In this paper, we aim to improve the performance of DeepReach on high-dimensional systems by exploring different choices of activation functions. We first run experiments on a 3D system as a proof of concept. Then we demonstrate the effectiveness of our approach on a 9D multi-vehicle collision problem. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2311.10245 [pdf, other]

Segment Anything in Defect Detection

Authors: Bozhen Hu, Bin Gao, Cheng Tan, Tongle Wu, Stan Z. Li

Abstract: Defect detection plays a crucial role in infrared non-destructive testing systems, offering non-contact, safe, and efficient inspection capabilities. However, challenges such as low resolution, high noise, and uneven heating in infrared thermal images hinder comprehensive and accurate defect detection. In this study, we propose DefectSAM, a novel approach for segmenting defects on highly noisy the… ▽ More Defect detection plays a crucial role in infrared non-destructive testing systems, offering non-contact, safe, and efficient inspection capabilities. However, challenges such as low resolution, high noise, and uneven heating in infrared thermal images hinder comprehensive and accurate defect detection. In this study, we propose DefectSAM, a novel approach for segmenting defects on highly noisy thermal images based on the widely adopted model, Segment Anything (SAM)\cite{kirillov2023segany}. Harnessing the power of a meticulously curated dataset generated through labor-intensive lab experiments and valuable prompts from experienced experts, DefectSAM surpasses existing state-of-the-art segmentation algorithms and achieves significant improvements in defect detection rates. Notably, DefectSAM excels in detecting weaker and smaller defects on complex and irregular surfaces, reducing the occurrence of missed detections and providing more accurate defect size estimations. Experimental studies conducted on various materials have validated the effectiveness of our solutions in defect detection, which hold significant potential to expedite the evolution of defect detection tools, enabling enhanced inspection capabilities and accuracy in defect identification. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.19477 [pdf, other]

VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation

Authors: Tingting Wu, Zhiyan Du, Zhi Li, Feng-Lei Fan, Tieyong Zeng

Abstract: Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find… ▽ More Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively. △ Less

Submitted 10 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 13 pages, 5 figures

arXiv:2310.14251 [pdf, other]

FAS-assisted NOMA Short-Packet Communication Systems

Authors: Jianchao Zheng, Tuo Wu, Xiazhi Lai, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

Abstract: In this paper, we investigate a fluid antenna system (FAS)-assisted downlink non-orthogonal multiple access (NOMA) for short-packet communications. The base station (BS) adopts a single fixed antenna, while both the central user (CU) and the cell-edge user (CEU) are equipped with a FAS. Each FAS comprises $N$ flexible positions (also known as ports), linked to $N$ arbitrarily correlated Rayleigh f… ▽ More In this paper, we investigate a fluid antenna system (FAS)-assisted downlink non-orthogonal multiple access (NOMA) for short-packet communications. The base station (BS) adopts a single fixed antenna, while both the central user (CU) and the cell-edge user (CEU) are equipped with a FAS. Each FAS comprises $N$ flexible positions (also known as ports), linked to $N$ arbitrarily correlated Rayleigh fading channels. We derive expressions for the average block error rate (BLER) of the FAS-assisted NOMA system and provide asymptotic BLER expressions. We determine that the diversity order for CU and CEU is $N$, indicating that the system performance can be considerably improved by increasing $N$. Simulation results validate the great performance of FAS. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Submitted to IEEE journal

arXiv:2310.07550 [pdf, other]

Proactive Monitoring via Jamming in Fluid Antenna Systems

Authors: Junteng Yao, Tuo Wu, Xiazhi Lai, Ming **, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

Abstract: This paper investigates the efficacy of utilizing fluid antenna system (FAS) at a legitimate monitor to oversee suspicious communication. The monitor switches the antenna position to minimize its outage probability for enhancing the monitoring performance. Our objective is to maximize the average monitoring rate, whose expression involves the integral of the first-order Marcum $Q$ function. The op… ▽ More This paper investigates the efficacy of utilizing fluid antenna system (FAS) at a legitimate monitor to oversee suspicious communication. The monitor switches the antenna position to minimize its outage probability for enhancing the monitoring performance. Our objective is to maximize the average monitoring rate, whose expression involves the integral of the first-order Marcum $Q$ function. The optimization problem, as initially posed, is non-convex owing to its objective function. Nevertheless, upon substituting with an upper bound, we provide a theoretical foundation confirming the existence of a unique optimal solution for the modified problem, achievable efficiently by the bisection search method. Furthermore, we also introduce a locally closed-form optimal resolution for maximizing the average monitoring rate. Empirical evaluations confirm that the proposed schemes outperform conventional benchmarks considerably. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 3 figs, submitted to IEEE journal

arXiv:2310.05810 [pdf]

doi 10.1007/978-3-031-39062-3_12

Dipole-Spread Function Engineering for 6D Super-Resolution Microscopy

Authors: Tingting Wu, Matthew D. Lew

Abstract: Fluorescent molecules are versatile nanoscale emitters that enable detailed observations of biophysical processes with nanoscale resolution. Because they are well-approximated as electric dipoles, imaging systems can be designed to visualize their 3D positions and 3D orientations, so-called dipole-spread function (DSF) engineering, for 6D super-resolution single-molecule orientation-localization m… ▽ More Fluorescent molecules are versatile nanoscale emitters that enable detailed observations of biophysical processes with nanoscale resolution. Because they are well-approximated as electric dipoles, imaging systems can be designed to visualize their 3D positions and 3D orientations, so-called dipole-spread function (DSF) engineering, for 6D super-resolution single-molecule orientation-localization microscopy (SMOLM). We review fundamental image-formation theory for fluorescent di-poles, as well as how phase and polarization modulation can be used to change the image of a dipole emitter produced by a microscope, called its DSF. We describe several methods for designing these modulations for optimum performance, as well as compare recently developed techniques, including the double-helix, tetrapod, crescent, and DeepSTORM3D learned point-spread functions (PSFs), in addition to the tri-spot, vortex, pixOL, raPol, CHIDO, and MVR DSFs. We also cover common imaging system designs and techniques for implementing engineered DSFs. Finally, we discuss recent biological applications of 6D SMOLM and future challenges for pushing the capabilities and utility of the technology. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: This is a preprint of the following chapter: Tingting Wu and Matthew D. Lew, Dipole-Spread Function Engineering for 6D Super-Resolution Microscopy, published in Coded optical imaging, edited by **yang Lang, 2023, Springer Nature reproduced with permission of Springer Nature

Journal ref: Liang, J. (ed) Coded Optical Imaging. Springer, Cham. (2024)

arXiv:2310.04961 [pdf, ps, other]

Reach-avoid Analysis for Sampled-data Systems with Measurement Uncertainties

Authors: Taoran Wu, De** Ren, Shuyuan Zhang, Lei Wang, Bai Xue

Abstract: Digital control has become increasingly prevalent in modern systems, making continuous-time plants controlled by discrete-time (digital) controllers ubiquitous and crucial across industries, including aerospace, automotive, and manufacturing. This paper focuses on investigating the reach-avoid problem in such systems, where the objective is to reach a goal set while avoiding unsafe states, especia… ▽ More Digital control has become increasingly prevalent in modern systems, making continuous-time plants controlled by discrete-time (digital) controllers ubiquitous and crucial across industries, including aerospace, automotive, and manufacturing. This paper focuses on investigating the reach-avoid problem in such systems, where the objective is to reach a goal set while avoiding unsafe states, especially in the presence of state measurement uncertainties. We propose an approach that builds upon the concept of exponential control guidance barrier functions, originally used for synthesizing continuous-time feedback controllers. We introduce a sufficient condition that, if met by a given continuous-time feedback controller, ensures the safe guidance of the system into the goal set in its sampled-data implementation, despite state measurement uncertainties. The event of reaching the goal set is determined based on state measurements obtained at the sampling time instants. Numerical examples are provided to demonstrate the validity of our theoretical developments, showcasing successful implementation in solving the reach-avoid problem in sampled-data systems with state measurement uncertainties. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.10379 [pdf, ps, other]

PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement

Authors: Jiahui Pan, Shulin He, Tianci Wu, Hui Zhang, Xueliang Zhang

Abstract: Multi-channel speech enhancement seeks to utilize spatial information to distinguish target speech from interfering signals. While deep learning approaches like the dual-path convolutional recurrent network (DPCRN) have made strides, challenges persist in effectively modeling inter-channel correlations and amalgamating multi-level information. In response, we introduce the Parallel Dual-Path Convo… ▽ More Multi-channel speech enhancement seeks to utilize spatial information to distinguish target speech from interfering signals. While deep learning approaches like the dual-path convolutional recurrent network (DPCRN) have made strides, challenges persist in effectively modeling inter-channel correlations and amalgamating multi-level information. In response, we introduce the Parallel Dual-Path Convolutional Recurrent Network (PDPCRN). This acoustic modeling architecture has two key innovations. First, a parallel design with separate branches extracts complementary features. Second, bi-directional modules enable cross-branch communication. Together, these facilitate diverse representation fusion and enhanced modeling. Experimental validation on TIMIT datasets underscores the prowess of PDPCRN. Notably, against baseline models like the standard DPCRN, PDPCRN not only outperforms in PESQ and STOI metrics but also boasts a leaner computational footprint with reduced parameters. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08895 [pdf, other]

CDDM: Channel Denoising Diffusion Models for Wireless Semantic Communications

Authors: Tong Wu, Zhiyong Chen, Dazhi He, Liang Qian, Yin Xu, Meixia Tao, Wenjun Zhang

Abstract: Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for eliminating noise leads us to wonder whether DM can be applied to wireless communications to help the receiver mitigate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for semantic… ▽ More Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for eliminating noise leads us to wonder whether DM can be applied to wireless communications to help the receiver mitigate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for semantic communications over wireless channels in this paper. CDDM can be applied as a new physical layer module after the channel equalization to learn the distribution of the channel input signal, and then utilizes this learned knowledge to remove the channel noise. We derive corresponding training and sampling algorithms of CDDM according to the forward diffusion process specially designed to adapt the channel models and theoretically prove that the well-trained CDDM can effectively reduce the conditional entropy of the received signal under small sampling steps. Moreover, we apply CDDM to a semantic communications system based on joint source-channel coding (JSCC) for image transmission. Extensive experimental results demonstrate that CDDM can further reduce the mean square error (MSE) after minimum mean square error (MMSE) equalizer, and the joint CDDM and JSCC system achieves better performance than the JSCC system and the traditional JPEG2000 with low-density parity-check (LDPC) code approach. △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: submitted to IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:2305.09161

arXiv:2309.07582 [pdf, other]

On Performance of Fluid Antenna System using Maximum Ratio Combining

Authors: Xiazhi Lai, Tuo Wu, Junteng Yao, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

Abstract: This letter investigates a fluid antenna system (FAS) where multiple ports can be activated for signal combining for enhanced receiver performance. Given $M$ ports at the FAS, the best $K$ ports out of the $M$ available ports are selected before maximum ratio combining (MRC) is used to combine the received signals from the selected ports. The aim of this letter is to study the achievable performan… ▽ More This letter investigates a fluid antenna system (FAS) where multiple ports can be activated for signal combining for enhanced receiver performance. Given $M$ ports at the FAS, the best $K$ ports out of the $M$ available ports are selected before maximum ratio combining (MRC) is used to combine the received signals from the selected ports. The aim of this letter is to study the achievable performance of FAS when more than one ports can be activated. We do so by analyzing the outage probability of this setup in Rayleigh fading channels through the utilization of Gauss-Chebyshev integration, lower bound estimation, and high signal-to-noise ratio (SNR) asymptotic approximations. Our analytical results demonstrate that FAS can harness rich spatial diversity, which is confirmed by computer simulations. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: submitted to IEEE journal

arXiv:2306.05578 [pdf, other]

Differential Privacy for Class-based Data: A Practical Gaussian Mechanism

Authors: Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert

Abstract: In this paper, we present a notion of differential privacy (DP) for data that comes from different classes. Here, the class-membership is private information that needs to be protected. The proposed method is an output perturbation mechanism that adds noise to the release of query response such that the analyst is unable to infer the underlying class-label. The proposed DP method is capable of not… ▽ More In this paper, we present a notion of differential privacy (DP) for data that comes from different classes. Here, the class-membership is private information that needs to be protected. The proposed method is an output perturbation mechanism that adds noise to the release of query response such that the analyst is unable to infer the underlying class-label. The proposed DP method is capable of not only protecting the privacy of class-based data but also meets quality metrics of accuracy and is computationally efficient and practical. We illustrate the efficacy of the proposed method empirically while outperforming the baseline additive Gaussian noise mechanism. We also examine a real-world application and apply the proposed DP method to the autoregression and moving average (ARMA) forecasting method, protecting the privacy of the underlying data source. Case studies on the real-world advanced metering infrastructure (AMI) measurements of household power consumption validate the excellent performance of the proposed DP method while also satisfying the accuracy of forecasted power consumption measurements. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Under review in IEEE Transactions on Information Forensics & Security

arXiv:2305.18146 [pdf]

A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

Authors: Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Abstract: Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships… ▽ More Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships among them. In light of this, a novel hierarchical approach is proposed in this paper for multi-aspect and multi-granular APA. Specifically, we first introduce the notion of sup-phonemes to explore more subtle semantic traits of L2 speakers. Second, a depth-wise separable convolution layer is exploited to better encapsulate the local context cues at the sub-word level. Finally, we use a score-restraint attention pooling mechanism to predict the sentence-level scores and optimize the component models with a multitask learning (MTL) framework. Extensive experiments carried out on a publicly-available benchmark dataset, viz. speechocean762, demonstrate the efficacy of our approach in relation to some cutting-edge baselines. △ Less

Submitted 7 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted to Interspeech 2023

arXiv:2305.10983 [pdf, other]

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Authors: Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, **g Xiao, Yinqiang Zheng, Yujiu Yang

Abstract: Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipe… ▽ More Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo-viewport sequences from a given starting point. Additionally, we design a Multi-scale Feature Aggregation (MFA) module with a Distortion-aware Block (DAB) to fuse distorted and semantic features of each viewport. We also devise Temporal Modeling Module (TMM) to learn the viewport transition in the temporal domain. Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/TianheWu/Assessor360. △ Less

Submitted 10 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.09165 [pdf, other]

Fusion-Based Multi-User Semantic Communications for Wireless Image Transmission over Degraded Broadcast Channels

Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Bin Xia, Wenjun Zhang

Abstract: Degraded broadcast channels (DBC) are a typical multi-user communications scenario. There exist classic transmission methods, such as superposition coding with successive interference cancellation, to achieve the DBC capacity region. However, semantic communications method over DBC remains lack of in-depth research. To address this, we design a fusion-based multi-user semantic communications syste… ▽ More Degraded broadcast channels (DBC) are a typical multi-user communications scenario. There exist classic transmission methods, such as superposition coding with successive interference cancellation, to achieve the DBC capacity region. However, semantic communications method over DBC remains lack of in-depth research. To address this, we design a fusion-based multi-user semantic communications system for wireless image transmission over DBC in this paper. The proposed architecture supports a transmitter extracting semantic features for two users separately, and learns to dynamically fuse these semantic features into a joint latent representation for broadcasting. The key here is to design a flexible image semantic fusion (FISF) module to fuse the semantic features of two users, and to use a multi-layer perceptron (MLP) based neural network to adjust the weights of different user semantic features for flexible adaptability to different users channels. Experiments present the semantic performance region based on the peak signal-to-noise ratio (PSNR) of both users, and show that the proposed system dominates the traditional methods. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.09161 [pdf, other]

CDDM: Channel Denoising Diffusion Models for Wireless Communications

Authors: Tong Wu, Zhiyong Chen, Dazhi He, Liang Qian, Yin Xu, Meixia Tao, Wenjun Zhang

Abstract: Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for removing noise leads us to wonder whether DM can be applied to wireless communications to help the receiver eliminate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for wireless c… ▽ More Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for removing noise leads us to wonder whether DM can be applied to wireless communications to help the receiver eliminate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for wireless communications in this paper. CDDM can be applied as a new physical layer module after the channel equalization to learn the distribution of the channel input signal, and then utilizes this learned knowledge to remove the channel noise. We design corresponding training and sampling algorithms for the forward diffusion process and the reverse sampling process of CDDM. Moreover, we apply CDDM to a semantic communications system based on joint source-channel coding (JSCC). Experimental results demonstrate that CDDM can further reduce the mean square error (MSE) after minimum mean square error (MMSE) equalizer, and the joint CDDM and JSCC system achieves better performance than the JSCC system and the traditional JPEG2000 with low-density parity-check (LDPC) code approach. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2304.11550 [pdf, other]

Provable Reach-avoid Controllers Synthesis Based on Inner-approximating Controlled Reach-avoid Sets

Authors: Jianqiang Ding, Taoran Wu, Yu** Qian, Lijun Zhang, Bai Xue

Abstract: In this paper, we propose an approach for synthesizing provable reach-avoid controllers, which drive a deterministic system operating in an unknown environment to safely reach a desired target set. The approach falls within the reachability analysis framework and is based on the computation of inner-approximations of controlled reach-avoid sets(CRSs). Given a target set and a safe set, the control… ▽ More In this paper, we propose an approach for synthesizing provable reach-avoid controllers, which drive a deterministic system operating in an unknown environment to safely reach a desired target set. The approach falls within the reachability analysis framework and is based on the computation of inner-approximations of controlled reach-avoid sets(CRSs). Given a target set and a safe set, the controlled reach-avoid set is the set of states such that starting from each of them, there exists at least one controller to ensure that the system can enter the target set while staying inside the safe set before the target hitting time. Therefore, the boundary of the controlled reach-avoid set acts as a barrier, which separating states capable of achieving the reach-avoid objective from those that are not, and thus the computed inner-approximation provides a viable space for the system to achieve the reach-avoid objective. Our approach for synthesizing reach-avoid controllers mainly consists of three steps. We first learn a safe set of states in the unknown environment from sensor measurements based on a support vector machine approach. Then, based on the learned safe set and target set, we compute an inner-approximation of the CRS. Finally, we synthesize controllers online to ensure that the system will reach the target set by evolving inside the computed inner-approximation. The proposed method is demonstrated on a Dubin's car system. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2304.08298 [pdf, other]

Implicit Bayes Adaptation: A Collaborative Transport Approach

Authors: Bo Jiang, Hamid Krim, Tianfu Wu, Derya Cansever

Abstract: The power and flexibility of Optimal Transport (OT) have pervaded a wide spectrum of problems, including recent Machine Learning challenges such as unsupervised domain adaptation. Its essence of quantitatively relating two probability distributions by some optimal metric, has been creatively exploited and shown to hold promise for many real-world data challenges. In a related theme in the present… ▽ More The power and flexibility of Optimal Transport (OT) have pervaded a wide spectrum of problems, including recent Machine Learning challenges such as unsupervised domain adaptation. Its essence of quantitatively relating two probability distributions by some optimal metric, has been creatively exploited and shown to hold promise for many real-world data challenges. In a related theme in the present work, we posit that domain adaptation robustness is rooted in the intrinsic (latent) representations of the respective data, which are inherently lying in a non-linear submanifold embedded in a higher dimensional Euclidean space. We account for the geometric properties by refining the $l^2$ Euclidean metric to better reflect the geodesic distance between two distinct representations. We integrate a metric correction term as well as a prior cluster structure in the source data of the OT-driven adaptation. We show that this is tantamount to an implicit Bayesian framework, which we demonstrate to be viable for a more robust and better-performing approach to domain adaptation. Substantiating experiments are also included for validation purposes. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.05482 [pdf, other]

Computational Pathology: A Survey Review and The Way Forward

Authors: Mahdi S. Hosseini, Babak Ehteshami Bejnordi, Vincent Quoc-Huy Trinh, Danial Hasan, Xingwen Li, Taehyo Kim, Haochen Zhang, Theodore Wu, Kajanan Chinniah, Sina Maghsoudlou, Ryan Zhang, Stephen Yang, Jiadai Zhu, Lyndon Chan, Samir Khaki, Andrei Buin, Fatemeh Chaji, Ala Salehi, Bich Ngoc Nguyen, Dimitris Samaras, Konstantinos N. Plataniotis

Abstract: Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that a… ▽ More Computational Pathology CPath is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath (https://github.com/AtlasAnalyticsLab/CPath_Survey). △ Less

Submitted 27 January, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Accepted in Elsevier Journal of Pathology Informatics (JPI) 2024

arXiv:2303.15206 [pdf, other]

Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views

Authors: Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli

Abstract: Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS metho… ▽ More Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation. △ Less

Submitted 24 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.10310 [pdf, other]

doi 10.1016/j.engappai.2023.107255

Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification

Authors: Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun

Abstract: The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-dom… ▽ More The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain. The problem with these unsupervised models lies in their unsupervised nature. For lack of annotations, it is not possible to use the traditional supervised metrics to evaluate these translation models to pick the best-saved checkpoint model. This paper introduces a new method called Domain-knowledge Inspired Pseudo Supervision (DIPS) which utilizes domain-informed Gaussian Mixture Models to generate pseudo annotations to enable the use of traditional supervised metrics. This method was designed specifically to support cross-domain classification applications contrary to other typically used metrics such as the FID which were designed to evaluate the model in terms of the quality of the generated image from a human-eye perspective. DIPS proves its effectiveness by outperforming various GAN evaluation metrics, including FID, when selecting the optimal saved checkpoint model. It is also evaluated against truly supervised metrics. Furthermore, DIPS showcases its robustness and interpretability by demonstrating a strong correlation with truly supervised metrics, highlighting its superiority over existing state-of-the-art alternatives. The code and data to replicate the results can be found on the official Github repository: https://github.com/Hindawi91/DIPS △ Less

Submitted 30 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2212.09107

arXiv:2302.10382 [pdf, other]

Constrained Reinforcement Learning for Predictive Control in Real-Time Stochastic Dynamic Optimal Power Flow

Authors: Tong Wu, Anna Scaglione, Daniel Arnold

Abstract: Deep Reinforcement Learning (DRL) has become a popular method for solving control problems in power systems. Conventional DRL encourages the agent to explore various policies encoded in a neural network (NN) with the goal of maximizing the reward function. However, this approach can lead to infeasible solutions that violate physical constraints such as power flow equations, voltage limits, and dyn… ▽ More Deep Reinforcement Learning (DRL) has become a popular method for solving control problems in power systems. Conventional DRL encourages the agent to explore various policies encoded in a neural network (NN) with the goal of maximizing the reward function. However, this approach can lead to infeasible solutions that violate physical constraints such as power flow equations, voltage limits, and dynamic constraints. Ensuring these constraints are met is crucial in power systems, as they are a safety critical infrastructure. To address this issue, existing DRL algorithms remedy the problem by projecting the actions onto the feasible set, which can result in sub-optimal solutions. This paper presents a novel primal-dual approach for learning optimal constrained DRL policies for dynamic optimal power flow problems, with the aim of controlling power generations and battery outputs. We also prove the convergence of the critic and actor networks. Our case studies on IEEE standard systems demonstrate the superiority of the proposed approach in dynamically adapting to the environment while maintaining safety constraints. △ Less

Submitted 7 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2302.09200 [pdf, other]

Brainomaly: Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-weighted Brain MR Images

Authors: Md Mahfuzur Rahman Siddiquee, Jay Shah, Teresa Wu, Catherine Chong, Todd J. Schwedt, Gina Dumkrieger, Simona Nikolova, Baoxin Li

Abstract: Harnessing the power of deep neural networks in the medical imaging domain is challenging due to the difficulties in acquiring large annotated datasets, especially for rare diseases, which involve high costs, time, and effort for annotation. Unsupervised disease detection methods, such as anomaly detection, can significantly reduce human effort in these scenarios. While anomaly detection typically… ▽ More Harnessing the power of deep neural networks in the medical imaging domain is challenging due to the difficulties in acquiring large annotated datasets, especially for rare diseases, which involve high costs, time, and effort for annotation. Unsupervised disease detection methods, such as anomaly detection, can significantly reduce human effort in these scenarios. While anomaly detection typically focuses on learning from images of healthy subjects only, real-world situations often present unannotated datasets with a mixture of healthy and diseased subjects. Recent studies have demonstrated that utilizing such unannotated images can improve unsupervised disease and anomaly detection. However, these methods do not utilize knowledge specific to registered neuroimages, resulting in a subpar performance in neurologic disease detection. To address this limitation, we propose Brainomaly, a GAN-based image-to-image translation method specifically designed for neurologic disease detection. Brainomaly not only offers tailored image-to-image translation suitable for neuroimages but also leverages unannotated mixed images to achieve superior neurologic disease detection. Additionally, we address the issue of model selection for inference without annotated samples by proposing a pseudo-AUC metric, further enhancing Brainomaly's detection performance. Extensive experiments and ablation studies demonstrate that Brainomaly outperforms existing state-of-the-art unsupervised disease and anomaly detection methods by significant margins in Alzheimer's disease detection using a publicly available dataset and headache detection using an institutional dataset. The code is available from https://github.com/mahfuzmohammad/Brainomaly. △ Less

Submitted 16 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted in WACV 2024

arXiv:2212.09107 [pdf, other]

doi 10.1016/j.eswa.2023.120265

A Framework for Generalizing Critical Heat Flux Detection Models Using Unsupervised Image-to-Image Translation

Authors: Firas Al-Hindawi, Tejaswi Soori, Han Hu, Md Mahfuzur Rahman Siddiquee, Hyunsoo Yoon, Teresa Wu, Ying Sun

Abstract: The detection of critical heat flux (CHF) is crucial in heat boiling applications as failure to do so can cause rapid temperature ramp leading to device failures. Many machine learning models exist to detect CHF, but their performance reduces significantly when tested on data from different domains. To deal with datasets from new domains a model needs to be trained from scratch. Moreover, the data… ▽ More The detection of critical heat flux (CHF) is crucial in heat boiling applications as failure to do so can cause rapid temperature ramp leading to device failures. Many machine learning models exist to detect CHF, but their performance reduces significantly when tested on data from different domains. To deal with datasets from new domains a model needs to be trained from scratch. Moreover, the dataset needs to be annotated by a domain expert. To address this issue, we propose a new framework to support the generalizability and adaptability of trained CHF detection models in an unsupervised manner. This approach uses an unsupervised Image-to-Image (UI2I) translation model to transform images in the target dataset to look like they were obtained from the same domain the model previously trained on. Unlike other frameworks dealing with domain shift, our framework does not require retraining or fine-tuning of the trained classification model nor does it require synthesized datasets in the training process of either the classification model or the UI2I model. The framework was tested on three boiling datasets from different domains, and we show that the CHF detection model trained on one dataset was able to generalize to the other two previously unseen datasets with high accuracy. Overall, the framework enables CHF detection models to adapt to data generated from different domains without requiring additional annotation effort or retraining of the model. △ Less

Submitted 17 March, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: This work has been submitted to the Expert Systems With Applications Journal on Sep 25, 2022

arXiv:2212.00611 [pdf, other]

Ultraviolet Scattering Communication Using Subcarrier Intensity Modulation over Atmospheric Turbulence Channels

Authors: Zanqiu Shen, Jianshe Ma, Tianfeng Wu, Tao Shan, Yupeng Chen, ** Su

Abstract: A closed-form non-line-of-sight (NLOS) turbulenceinduced fluctuation model is derived for ultraviolet scattering communication (USC), which models the received irradiance fluctuation by Meijer G function. Based on this model, we investigate the error rates of the USC system in NLOS case using different modulation techniques. Closed-form error rate results are derived by integration of Meijer G fun… ▽ More A closed-form non-line-of-sight (NLOS) turbulenceinduced fluctuation model is derived for ultraviolet scattering communication (USC), which models the received irradiance fluctuation by Meijer G function. Based on this model, we investigate the error rates of the USC system in NLOS case using different modulation techniques. Closed-form error rate results are derived by integration of Meijer G function. Inspired by the decomposition of different turbulence parameters, we use a series expansion of hypergeometric function and obtain the error rate expressions by the sum of four infinite series. The numerical results show that our error rate results are accurate in NLOS case. We also study the relationship between the turbulence influence and NLOS transceiver configurations. The numerical results show that when two-LOS link formulates the same distance, the turbulence influence is the strongest for long ranges and the weakest for short ranges. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.16044 [pdf, other]

Model Extraction Attack against Self-supervised Speech Models

Authors: Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-yi Lee

Abstract: Self-supervised learning (SSL) speech models generate meaningful representations of given clips and achieve incredible performance across various downstream tasks. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. In this work, we study the MEA problem against SSL speech model with a small number of queries. We propose… ▽ More Self-supervised learning (SSL) speech models generate meaningful representations of given clips and achieve incredible performance across various downstream tasks. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. In this work, we study the MEA problem against SSL speech model with a small number of queries. We propose a two-stage framework to extract the model. In the first stage, SSL is conducted on the large-scale unlabeled corpus to pre-train a small speech model. Secondly, we actively sample a small portion of clips from the unlabeled corpus and query the target model with these clips to acquire their representations as labels for the small model's second-stage training. Experiment results show that our sampling methods can effectively extract the target model without knowing any information about its model architecture. △ Less

Submitted 8 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2210.11998 [pdf, other]

Fingerprint Based mmWave Positioning System Aided by Reconfigurable Intelligent Surface

Authors: Tuo Wu, Cunhua Pan, Yi** Pan, Hong Ren, Maged Elkashlan, Cheng-Xiang Wang

Abstract: Reconfigurable intelligent surface (RIS) is a promising technique for millimeter wave (mmWave) positioning systems. In this paper, we consider multiple mobile users (MUs) positioning problem in the multiple-input multiple-output (MIMO) time-division duplex (TDD) mmWave systems aided by the RIS. We derive the expression for the space-time channel response vector (STCRV) as a novel type of fingerpri… ▽ More Reconfigurable intelligent surface (RIS) is a promising technique for millimeter wave (mmWave) positioning systems. In this paper, we consider multiple mobile users (MUs) positioning problem in the multiple-input multiple-output (MIMO) time-division duplex (TDD) mmWave systems aided by the RIS. We derive the expression for the space-time channel response vector (STCRV) as a novel type of fingerprint. The STCRV consists of the multipath channel characteristics, e.g., time delay and angle of arrival (AOA), which is related to the position of the MU. By using the STCRV as input, we propose a novel residual convolution network regression (RCNR) learning algorithm to output the estimated three-dimensional (3D) position of the MU. Specifically, the RCNR learninng algorithm includes a data processing block to process the input STCRV, a normal convolution block to extract the features of STCRV, four residual convolution blocks to further extract the features and protect the integrity of the features, and a regression block to estimate the 3D position. Extensive simulation results are also presented to demonstrate that the proposed RCNR learning algorithm outperforms the traditional convolution neural network (CNN). △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 5 pages, 9 figures

arXiv:2210.05436 [pdf, other]

Retinex Image Enhancement Based on Sequential Decomposition With a Plug-and-Play Framework

Authors: Tingting Wu, Wenna Wu, Ying Yang, Feng-Lei Fan, Tieyong Zeng

Abstract: The Retinex model is one of the most representative and effective methods for low-light image enhancement. However, the Retinex model does not explicitly tackle the noise problem, and shows unsatisfactory enhancing results. In recent years, due to the excellent performance, deep learning models have been widely used in low-light image enhancement. However, these methods have two limitations: i) Th… ▽ More The Retinex model is one of the most representative and effective methods for low-light image enhancement. However, the Retinex model does not explicitly tackle the noise problem, and shows unsatisfactory enhancing results. In recent years, due to the excellent performance, deep learning models have been widely used in low-light image enhancement. However, these methods have two limitations: i) The desirable performance can only be achieved by deep learning when a large number of labeled data are available. However, it is not easy to curate massive low/normal-light paired data; ii) Deep learning is notoriously a black-box model [1]. It is difficult to explain their inner-working mechanism and understand their behaviors. In this paper, using a sequential Retinex decomposition strategy, we design a plug-and-play framework based on the Retinex theory for simultaneously image enhancement and noise removal. Meanwhile, we develop a convolutional neural network-based (CNN-based) denoiser into our proposed plug-and-play framework to generate a reflectance component. The final enhanced image is produced by integrating the illumination and reflectance with gamma correction. The proposed plug-and-play framework can facilitate both post hoc and ad hoc interpretability. Extensive experiments on different datasets demonstrate that our framework outcompetes the state-of-the-art methods in both image enhancement and denoising. △ Less

Submitted 17 February, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2209.12900 [pdf, other]

The Efficacy of Self-Supervised Speech Models for Audio Representations

Authors: Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee

Abstract: Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on non-speech datasets is relatively less explored. In this work, we propose an ensemble framework, with a combination of ensemble techniques, to fuse SSL speech mo… ▽ More Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on non-speech datasets is relatively less explored. In this work, we propose an ensemble framework, with a combination of ensemble techniques, to fuse SSL speech models' embeddings. Extensive experiments on speech and non-speech audio datasets are conducted to investigate the representation abilities of our ensemble method and its single constituent model. Ablation studies are carried out to evaluate the performances of different ensemble techniques, such as feature averaging and concatenation. All experiments are conducted during NeurIPS 2021 HEAR Challenge as a standard evaluation pipeline provided by competition officials. Results demonstrate SSL speech models' strong abilities on various non-speech tasks, while we also note that they fail to deal with fine-grained music tasks, such as pitch classification and note onset detection. In addition, feature ensemble is shown to have great potential on producing more holistic representations, as our proposed framework generally surpasses state-of-the-art SSL speech/audio models and has superior performance on various datasets compared with other teams in HEAR Challenge. Our code is available at https://github.com/tony10101105/HEAR-2021-NeurIPS-Challenge -- NTU-GURA. △ Less

Submitted 31 January, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

arXiv:2209.01822 [pdf, other]

HealthyGAN: Learning from Unannotated Medical Images to Detect Anomalies Associated with Human Disease

Authors: Md Mahfuzur Rahman Siddiquee, Jay Shah, Teresa Wu, Catherine Chong, Todd Schwedt, Baoxin Li

Abstract: Automated anomaly detection from medical images, such as MRIs and X-rays, can significantly reduce human effort in disease diagnosis. Owing to the complexity of modeling anomalies and the high cost of manual annotation by domain experts (e.g., radiologists), a typical technique in the current medical imaging literature has focused on deriving diagnostic models from healthy subjects only, assuming… ▽ More Automated anomaly detection from medical images, such as MRIs and X-rays, can significantly reduce human effort in disease diagnosis. Owing to the complexity of modeling anomalies and the high cost of manual annotation by domain experts (e.g., radiologists), a typical technique in the current medical imaging literature has focused on deriving diagnostic models from healthy subjects only, assuming the model will detect the images from patients as outliers. However, in many real-world scenarios, unannotated datasets with a mix of both healthy and diseased individuals are abundant. Therefore, this paper poses the research question of how to improve unsupervised anomaly detection by utilizing (1) an unannotated set of mixed images, in addition to (2) the set of healthy images as being used in the literature. To answer the question, we propose HealthyGAN, a novel one-directional image-to-image translation method, which learns to translate the images from the mixed dataset to only healthy images. Being one-directional, HealthyGAN relaxes the requirement of cycle consistency of existing unpaired image-to-image translation methods, which is unattainable with mixed unannotated data. Once the translation is learned, we generate a difference map for any given image by subtracting its translated output. Regions of significant responses in the difference map correspond to potential anomalies (if any). Our HealthyGAN outperforms the conventional state-of-the-art methods by significant margins on two publicly available datasets: COVID-19 and NIH ChestX-ray14, and one institutional dataset collected from Mayo Clinic. The implementation is publicly available at https://github.com/mahfuzmohammad/HealthyGAN. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: International Workshop on Simulation and Synthesis in Medical Imaging, MICCAI, 2022

arXiv:2209.00992 [pdf, other]

Single-scatter channel impulse response model of non-line-of-sight ultraviolet communications

Authors: Tian Cao, Shihan Chen, Tianfeng Wu, Changyong Pan, Jian Song

Abstract: Previous studies on the temporal characteristics of single-scatter transmission in non-line-of-sight (NLOS) ultraviolet communications (UVC) were based on the prolate-spheroidal coordinate system. In this work, a novel single-scatter channel impulse response (CIR) model is proposed in the spherical coordinate system, which is more natural and comprehensible than the prolate-spheroidal coordinate s… ▽ More Previous studies on the temporal characteristics of single-scatter transmission in non-line-of-sight (NLOS) ultraviolet communications (UVC) were based on the prolate-spheroidal coordinate system. In this work, a novel single-scatter channel impulse response (CIR) model is proposed in the spherical coordinate system, which is more natural and comprehensible than the prolate-spheroidal coordinate system in practical applications. Additionally, the results of the widely accepted Monte-Carlo (MC)-based channel model of NLOS UVC are provided to verify the proposed single-scatter CIR model. Results indicate that the computational time costed by the proposed single-scatter CIR model is decreased to less than 0.7% of the MC-based one with comparable accuracy in assessing the temporal characteristics of NLOS UVC channels. △ Less

Submitted 28 August, 2022; originally announced September 2022.

Comments: 10 pages, 4 figures

arXiv:2208.09110 [pdf]

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

Authors: Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Abstract: As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leverag… ▽ More As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic and phonological features to provide a multi-view, multi-granularity, and multi-aspect (3M) pronunciation modeling. Specifically, we augment GOP with prosodic and self-supervised learning (SSL) features, and meanwhile develop a vowel/consonant positional embedding for a more phonology-aware automatic pronunciation assessment. A series of experiments conducted on the publicly-available speechocean762 dataset show that our approach can obtain significant improvements on several assessment granularities in comparison with previous work, especially on the assessment of speaking fluency and speech prosody. △ Less

Submitted 11 September, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted to APSIPA ASC 2022

arXiv:2208.08485 [pdf, other]

Complex-Value Spatio-temporal Graph Convolutional Neural Networks and its Applications to Electric Power Systems AI

Authors: Tong Wu, Anna Scaglione, Daniel Arnold

Abstract: The effective representation, precessing, analysis, and visualization of large-scale structured data over graphs are gaining a lot of attention. So far most of the literature has focused on real-valued signals. However, signals are often sparse in the Fourier domain, and more informative and compact representations for them can be obtained using the complex envelope of their spectral components, a… ▽ More The effective representation, precessing, analysis, and visualization of large-scale structured data over graphs are gaining a lot of attention. So far most of the literature has focused on real-valued signals. However, signals are often sparse in the Fourier domain, and more informative and compact representations for them can be obtained using the complex envelope of their spectral components, as opposed to the original real-valued signals. Motivated by this fact, in this work we generalize graph convolutional neural networks (GCN) to the complex domain, deriving the theory that allows to incorporate a complex-valued graph shift operators (GSO) in the definition of graph filters (GF) and process complex-valued graph signals (GS). The theory developed can handle spatio-temporal complex network processes. We prove that complex-valued GCNs are stable with respect to perturbations of the underlying graph support, the bound of the transfer error and the bound of error propagation through multiply layers. Then we apply complex GCN to power grid state forecasting, power grid cyber-attack detection and localization. △ Less

Submitted 22 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2208.07606 [pdf, ps, other]

RIS-Aided Localization Algorithm and Analysis: Tackling Non-Gaussian Angle Estimation Errors

Authors: Tuo Wu, Hong Ren, Cunhua Pan, Yi** Pan, Sheng Hong, Maged Elkashlan, Feng Shu, Jiangzhou Wang

Abstract: Reconfigurable intelligent surface (RIS)-aided localization systems are increasingly recognized for enhancing accuracy in internet of things (IoT) networks. However, prevailing studies tend to either assume a Gaussian distribution for angle estimation error (AEE) or directly neglect the impact of the AEE, overlooking its non-Gaussian nature in real-world scenarios, particularly with diverse estima… ▽ More Reconfigurable intelligent surface (RIS)-aided localization systems are increasingly recognized for enhancing accuracy in internet of things (IoT) networks. However, prevailing studies tend to either assume a Gaussian distribution for angle estimation error (AEE) or directly neglect the impact of the AEE, overlooking its non-Gaussian nature in real-world scenarios, particularly with diverse estimation methods (e.g., 2D-DFT algorithm). Addressing this oversight, this paper explores the design and performance analysis of RIS-aided localization systems, specifically tackling non-Gaussian AEE. We adopt the classical two-step three-dimensional (3D) localization scheme to determine the position of mobile user (MU). Initially, we estimate angles of arrival (AoAs) and time differences of arrival (TDoAs) at the RIS using different methods, resulting in non-Gaussian and Gaussian errors, respectively. Subsequently, to accommodate the non-Gaussian nature of AoAs errors and the Gaussian character of TDoA errors, we design a multiple weighted least squares (mWLS) algorithm to accurately localize MU. Besides, our research also includes a unique bias analysis for evaluating the performance of the proposed localization algorithm under both Gaussian and non-Gaussian errors. Simulation results demonstrate the effectiveness of both the proposed mWLS algorithm and the bias analysis methodology. △ Less

Submitted 18 March, 2024; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)

arXiv:2208.07602 [pdf, other]

Joint Angle Estimation Error Analysis and 3D Positioning Algorithm Design for mmWave Positioning System

Authors: Tuo Wu, Cunhua Pan, Yi** Pan, Sheng Hong, Hong Ren, Maged Elkashlan, Feng Shu, Jiangzhou Wang

Abstract: In this paper, we propose a comprehensive framework to jointly analyze the angle estimation error and design the three-dimensional (3D) positioning algorithm for a millimeter wave (mmWave) positioning system. First, we estimate the angles of arrival (AoAs) at the anchors by applying the two-dimensional discrete Fourier transform (2D-DFT) algorithm. Based on the property of the 2D-DFT algorithm, th… ▽ More In this paper, we propose a comprehensive framework to jointly analyze the angle estimation error and design the three-dimensional (3D) positioning algorithm for a millimeter wave (mmWave) positioning system. First, we estimate the angles of arrival (AoAs) at the anchors by applying the two-dimensional discrete Fourier transform (2D-DFT) algorithm. Based on the property of the 2D-DFT algorithm, the angle estimation error is analyzed in terms of probability density functions (PDF). The analysis shows that the derived angle estimation error is non-Gaussian, which is different from the existing work. Second, the intricate expression of the PDF of the AoA estimation error is simplified by employing the first-order linear approximation of triangle functions. Then, we derive a complex expression for the variance based on the derived PDF. Specifically, for the azimuth estimation error, the variance is separately integrated according to the different non-zero intervals of the PDF. Finally, we apply the weighted least square (WLS) algorithm to estimate the 3D position of the MU by using the estimated AoAs and the obtained non-Gaussian variance. Extensive simulation results confirm that the derived angle estimation error is non-Gaussian, and also demonstrate the superiority of the proposed framework. △ Less

Submitted 14 November, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Keywords: mmWave Positioning System

arXiv:2205.08598 [pdf, other]

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Authors: Mostafa Karimi, Changliang Liu, Kenichi Kumatani, Yao Qian, Tianyu Wu, Jian Wu

Abstract: Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on… ▽ More Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR). These great improvements have been reported mostly based on highly curated datasets such as LibriSpeech for non-streaming End-to-End ASR models. However, the pivotal characteristics of SSL is to be utilized for any untranscribed audio data. In this paper, we provide a full exploration on how to utilize uncurated audio data in SSL from data pre-processing to deploying an streaming hybrid ASR model. More specifically, we present (1) the effect of Audio Event Detection (AED) model in data pre-processing pipeline (2) analysis on choosing optimizer and learning rate scheduling (3) comparison of recently developed contrastive losses, (4) comparison of various pre-training strategies such as utilization of in-domain versus out-domain pre-training data, monolingual versus multilingual pre-training data, multi-head multilingual SSL versus single-head multilingual SSL and supervised pre-training versus SSL. The experimental results show that SSL pre-training with in-domain uncurated data can achieve better performance in comparison to all the alternative out-domain pre-training strategies. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Showing 1–50 of 89 results for author: Wu, T