-
Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction
Authors:
Feilong Jiang,
Xiaonan Hou,
Min Xia
Abstract:
Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-base…
▽ More
Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-based Hidden Physics-informed Neural Network (STA-HPINN) for RUL prediction, which can utilize the associated physics of the system degradation. The spatio-temporal attention mechanism can extract important features from the input data. With the self-attention mechanism on both the sensor dimension and time step dimension, the proposed model can effectively extract degradation information. The hidden physics-informed neural network is utilized to capture the physics mechanisms that govern the evolution of RUL. With the constraint of physics, the model can achieve higher accuracy and reasonable predictions. The approach is validated on a benchmark dataset, demonstrating exceptional performance when compared to cutting-edge methods, especially in the case of complex conditions.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Ensuring Safety at Intelligent Intersections: Temporal Logic Meets Reachability Analysis
Authors:
Kaj Munhoz Arfvidsson,
Frank J. Jiang,
Karl H. Johansson,
Jonas Mårtensson
Abstract:
In this work, we propose an approach for ensuring the safety of vehicles passing through an intelligent intersection. There are many proposals for the design of intelligent intersections that introduce central decision-makers to intersections for enhancing the efficiency and safety of the vehicles. To guarantee the safety of such designs, we develop a safety framework for intersections based on te…
▽ More
In this work, we propose an approach for ensuring the safety of vehicles passing through an intelligent intersection. There are many proposals for the design of intelligent intersections that introduce central decision-makers to intersections for enhancing the efficiency and safety of the vehicles. To guarantee the safety of such designs, we develop a safety framework for intersections based on temporal logic and reachability analysis. We start by specifying the required behavior for all the vehicles that need to pass through the intersection as linear temporal logic formula. Then, using temporal logic trees, we break down the linear temporal logic specification into a series of Hamilton-Jacobi reachability analyses in an automated fashion. By successfully constructing the temporal logic tree through reachability analysis, we verify the feasibility of the intersection specification. By taking this approach, we enable a safety framework that is able to automatically provide safety guarantees on new intersection behavior specifications. To evaluate our approach, we implement the framework on a simulated T-intersection, where we show that we can check and guarantee the safety of vehicles with potentially conflicting paths.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Small-Scale Testbed for Evaluating C-V2X Applications on 5G Cellular Networks
Authors:
Kaj Munhoz Arfvidsson,
Kleio Fragkedaki,
Frank J. Jiang,
Vandana Narri,
Hans-Cristian Lindh,
Karl H. Johansson,
Jonas Mårtensson
Abstract:
In this work, we present a small-scale testbed for evaluating the real-life performance of cellular V2X (C-V2X) applications on 5G cellular networks. Despite the growing interest and rapid technology development for V2X applications, researchers still struggle to prototype V2X applications with real wireless networks, hardware, and software in the loop in a controlled environment. To help alleviat…
▽ More
In this work, we present a small-scale testbed for evaluating the real-life performance of cellular V2X (C-V2X) applications on 5G cellular networks. Despite the growing interest and rapid technology development for V2X applications, researchers still struggle to prototype V2X applications with real wireless networks, hardware, and software in the loop in a controlled environment. To help alleviate this challenge, we present a testbed designed to accelerate development and evaluation of C-V2X applications on 5G cellular networks. By including a small-scale vehicle platform into the testbed design, we significantly reduce the time and effort required to test new C-V2X applications on 5G cellular networks. With a focus around the integration of small-scale vehicle platforms, we detail the design decisions behind the full software and hardware setup of commonly needed intelligent transport system agents (e.g. sensors, servers, vehicles). Moreover, to showcase the testbed's capability to produce industrially-relevant, real world performance evaluations, we present an evaluation of a simple test case inspired from shared situational awareness. Finally, we discuss the upcoming use of the testbed for evaluating 5G cellular network-based shared situational awareness and other C-V2X applications.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
S-IQA Image Quality Assessment With Compressive Sampling
Authors:
Ronghua Liao,
Chen Hui,
Lang Yuan,
Feng Jiang
Abstract:
No-Reference Image Quality Assessment (IQA) aims at estimating image quality in accordance with subjective human perception. However, most existing NR-IQA methods focus on exploring increasingly complex networks or components to improve the final performance. Such practice imposes great limitations and complexity on IQA methods, especially when they are applied to high-resolution (HR) images in th…
▽ More
No-Reference Image Quality Assessment (IQA) aims at estimating image quality in accordance with subjective human perception. However, most existing NR-IQA methods focus on exploring increasingly complex networks or components to improve the final performance. Such practice imposes great limitations and complexity on IQA methods, especially when they are applied to high-resolution (HR) images in the real world. Actually, most images own high spatial redundancy, especially for those HR data. To further exploit the characteristic and alleviate the issue above, we propose a new framework for Image Quality Assessment with compressive Sampling (dubbed S-IQA), which consists of three components: (1) The Flexible Sampling Module (FSM) samples the image to obtain measurements at an arbitrary ratio. (2) Vision Transformer with the Adaptive Embedding Module (AEM) makes measurements of uniform size and extracts deep features (3) Dual Branch (DB) allocates weight for every patch and predicts the final quality score. Experiments show that our proposed S-IQA achieves state-of-the-art result on various datasets with less data usage.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
SC-HVPPNet: Spatial and Channel Hybrid-Attention Video Post-Processing Network with CNN and Transformer
Authors:
Tong Zhang,
Wenxue Cui,
Shaohui Liu,
Feng Jiang
Abstract:
Convolutional Neural Network (CNN) and Transformer have attracted much attention recently for video post-processing (VPP). However, the interaction between CNN and Transformer in existing VPP methods is not fully explored, leading to inefficient communication between the local and global extracted features. In this paper, we explore the interaction between CNN and Transformer in the task of VPP, a…
▽ More
Convolutional Neural Network (CNN) and Transformer have attracted much attention recently for video post-processing (VPP). However, the interaction between CNN and Transformer in existing VPP methods is not fully explored, leading to inefficient communication between the local and global extracted features. In this paper, we explore the interaction between CNN and Transformer in the task of VPP, and propose a novel Spatial and Channel Hybrid-Attention Video Post-Processing Network (SC-HVPPNet), which can cooperatively exploit the image priors in both spatial and channel domains. Specifically, in the spatial domain, a novel spatial attention fusion module is designed, in which two attention weights are generated to fuse the local and global representations collaboratively. In the channel domain, a novel channel attention fusion module is developed, which can blend the deep representations at the channel dimension dynamically. Extensive experiments show that SC-HVPPNet notably boosts video restoration quality, with average bitrate savings of 5.29%, 12.42%, and 13.09% for Y, U, and V components in the VTM-11.0-NNVC RA configuration.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Guaranteed Completion of Complex Tasks via Temporal Logic Trees and Hamilton-Jacobi Reachability
Authors:
Frank J. Jiang,
Kaj Munhoz Arfvidsson,
Chong He,
Mo Chen,
Karl H. Johansson
Abstract:
In this paper, we present an approach for guaranteeing the completion of complex tasks with cyber-physical systems (CPS). Specifically, we leverage temporal logic trees constructed using Hamilton-Jacobi reachability analysis to (1) check for the existence of control policies that complete a specified task and (2) develop a computationally-efficient approach to synthesize the full set of control in…
▽ More
In this paper, we present an approach for guaranteeing the completion of complex tasks with cyber-physical systems (CPS). Specifically, we leverage temporal logic trees constructed using Hamilton-Jacobi reachability analysis to (1) check for the existence of control policies that complete a specified task and (2) develop a computationally-efficient approach to synthesize the full set of control inputs the CPS can implement in real-time to ensure the task is completed. We show that, by checking the approximation directions of each state set in the temporal logic tree, we can check if the temporal logic tree suffers from the "leaking corner issue," where the intersection of reachable sets yields an incorrect approximation. By ensuring a temporal logic tree has no leaking corners, we know the temporal logic tree correctly verifies the existence of control policies that satisfy the specified task. After confirming the existence of control policies, we show that we can leverage the value functions obtained through Hamilton-Jacobi reachability analysis to efficiently compute the set of control inputs the CPS can implement throughout the deployment time horizon to guarantee the completion of the specified task. Finally, we use a newly released Python toolbox to evaluate the presented approach on a simulated driving task.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Formal Verification of Linear Temporal Logic Specifications Using Hybrid Zonotope-Based Reachability Analysis
Authors:
Loizos Hadjiloizou,
Frank J. Jiang,
Amr Alanwar,
Karl H. Johansson
Abstract:
In this paper, we introduce a hybrid zonotope-based approach for formally verifying the behavior of autonomous systems operating under Linear Temporal Logic (LTL) specifications. In particular, we formally verify the LTL formula by constructing temporal logic trees (TLT)s via backward reachability analysis (BRA). In previous works, TLTs are predominantly constructed with either highly general and…
▽ More
In this paper, we introduce a hybrid zonotope-based approach for formally verifying the behavior of autonomous systems operating under Linear Temporal Logic (LTL) specifications. In particular, we formally verify the LTL formula by constructing temporal logic trees (TLT)s via backward reachability analysis (BRA). In previous works, TLTs are predominantly constructed with either highly general and computationally intensive level set-based BRA or simplistic and computationally efficient polytope-based BRA. In this work, we instead propose the construction of TLTs using hybrid zonotope-based BRA. By using hybrid zonotopes, we show that we are able to formally verify LTL specifications in a computationally efficient manner while still being able to represent complex geometries that are often present when deploying autonomous systems, such as non-convex, disjoint sets. Moreover, we evaluate our approach on a parking example, providing preliminary indications of how hybrid zonotopes facilitate computationally efficient formal verification of LTL specifications in environments that naturally lead to non-convex, disjoint geometries.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Reachability Analysis Using Constrained Polynomial Logical Zonotopes
Authors:
Ahmad Hafez,
Frank J. Jiang,
Karl H. Johansson,
Amr Alanwar
Abstract:
In this paper, we propose reachability analysis using constrained polynomial logical zonotopes. We perform reachability analysis to compute the set of states that could be reached. To do this, we utilize a recently introduced set representation called polynomial logical zonotopes for performing computationally efficient and exact reachability analysis on logical systems. Notably, polynomial logica…
▽ More
In this paper, we propose reachability analysis using constrained polynomial logical zonotopes. We perform reachability analysis to compute the set of states that could be reached. To do this, we utilize a recently introduced set representation called polynomial logical zonotopes for performing computationally efficient and exact reachability analysis on logical systems. Notably, polynomial logical zonotopes address the "curse of dimensionality" when analyzing the reachability of logical systems since the set representation can represent $2^h$ binary vectors using $h$ generators. After finishing the reachability analysis, the formal verification involves verifying whether the intersection of the calculated reachable set and the unsafe set is empty or not. Polynomial logical zonotopes lack closure under intersections, prompting the formulation of constrained polynomial logical zonotopes, which preserve the computational efficiency and exactness of polynomial logical zonotopes for reachability analysis while enabling exact intersections. Additionally, an extensive empirical study is presented to demonstrate and validate the advantages of constrained polynomial logical zonotopes.
△ Less
Submitted 19 June, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Integrated Communications and Localization for Massive MIMO LEO Satellite Systems
Authors:
Li You,
Xiaoyu Qiang,
Yongxiang Zhu,
Fan Jiang,
Christos G. Tsinos,
Wen** Wang,
Henk Wymeersch,
Xiqi Gao,
Björn Ottersten
Abstract:
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus…
▽ More
Integrated communications and localization (ICAL) will play an important part in future sixth generation (6G) networks for the realization of Internet of Everything (IoE) to support both global communications and seamless localization. Massive multiple-input multiple-output (MIMO) low earth orbit (LEO) satellite systems have great potential in providing wide coverage with enhanced gains, and thus are strong candidates for realizing ubiquitous ICAL. In this paper, we develop a wideband massive MIMO LEO satellite system to simultaneously support wireless communications and localization operations in the downlink. In particular, we first characterize the signal propagation properties and derive a localization performance bound. Based on these analyses, we focus on the hybrid analog/digital precoding design to achieve high communication capability and localization precision. Numerical results demonstrate that the proposed ICAL scheme supports both the wireless communication and localization operations for typical system setups.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation
Authors:
Saiyang Na,
Yuzhi Guo,
Feng Jiang,
Hehuan Ma,
Junzhou Huang
Abstract:
In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a k…
▽ More
In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a key challenge: the generation of high-quality, informative prompts is as crucial as applying state-of-the-art (SOTA) fine-tuning techniques on foundation models. To address this, we introduce Segment Any Cell (SAC), an innovative framework that enhances SAM specifically for nuclei segmentation. SAC integrates a Low-Rank Adaptation (LoRA) within the attention layer of the Transformer to improve the fine-tuning process, outperforming existing SOTA methods. It also introduces an innovative auto-prompt generator that produces effective prompts to guide segmentation, a critical factor in handling the complexities of nuclei segmentation in biomedical imaging. Our extensive experiments demonstrate the superiority of SAC in nuclei segmentation tasks, proving its effectiveness as a tool for pathologists and researchers. Our contributions include a novel prompt generation strategy, automated adaptability for diverse segmentation tasks, the innovative application of Low-Rank Attention Adaptation in SAM, and a versatile framework for semantic segmentation challenges.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Authors:
Dongdi Zhao,
Jianbo Ma,
Lu Lu,
**ke Li,
Xuan Ji,
Lei Zhu,
Fuming Fang,
Ming Liu,
Feijun Jiang
Abstract:
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen…
▽ More
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
How AI-driven Digital Twins Can Empower Mobile Networks
Authors:
Tong Li,
Fenyu Jiang,
Qiaohong Yu,
Wenzhen Huang,
Tao Jiang,
Depeng **
Abstract:
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which…
▽ More
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Robust Indoor Localization with Ranging-IMU Fusion
Authors:
Fan Jiang,
David Caruso,
Ashutosh Dhekne,
Qi Qu,
Jakob Julian Engel,
**g Dong
Abstract:
Indoor wireless ranging localization is a promising approach for low-power and high-accuracy localization of wearable devices. A primary challenge in this domain stems from non-line of sight propagation of radio waves. This study tackles a fundamental issue in wireless ranging: the unpredictability of real-time multipath determination, especially in challenging conditions such as when there is no…
▽ More
Indoor wireless ranging localization is a promising approach for low-power and high-accuracy localization of wearable devices. A primary challenge in this domain stems from non-line of sight propagation of radio waves. This study tackles a fundamental issue in wireless ranging: the unpredictability of real-time multipath determination, especially in challenging conditions such as when there is no direct line of sight. We achieve this by fusing range measurements with inertial measurements obtained from a low cost Inertial Measurement Unit (IMU). For this purpose, we introduce a novel asymmetric noise model crafted specifically for non-Gaussian multipath disturbances. Additionally, we present a novel Levenberg-Marquardt (LM)-family trust-region adaptation of the iSAM2 fusion algorithm, which is optimized for robust performance for our ranging-IMU fusion problem. We evaluate our solution in a densely occupied real office environment. Our proposed solution can achieve temporally consistent localization with an average absolute accuracy of $\sim$0.3m in real-world settings. Furthermore, our results indicate that we can achieve comparable accuracy even with infrequent (1Hz) range measurements.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
A Novel Catastrophic Condition for Periodically Time-varying Convolutional Encoders Based on Time-varying Equivalent Convolutional Encoders
Authors:
Fan Jiang
Abstract:
A convolutional encoder is said to be catastrophic if it maps an information sequence of infinite weight into a code sequence of finite weight. As a consequence of this map**, a finite number of channel errors may cause an infinite number of information bit errors when decoding. This situation should be avoided. A catastrophic condition to determine if a time-invariant convolutional encoder is c…
▽ More
A convolutional encoder is said to be catastrophic if it maps an information sequence of infinite weight into a code sequence of finite weight. As a consequence of this map**, a finite number of channel errors may cause an infinite number of information bit errors when decoding. This situation should be avoided. A catastrophic condition to determine if a time-invariant convolutional encoder is catastrophic or not is stated in \cite{Massey:LSC}. Palazzo developed this condition for periodically time-varying convolutional encoders in \cite{Palazzo:Analysis}. Since Palazzo's condition is based on the state transition table of the constituent encoders, its complexity increases exponentially with the number of memory elements in the encoders. A novel catastrophic condition making use of time-varying equivalent convolutional encoders is presented in this letter. A technique to convert a catastrophic periodically time-varying convolutional encoder into a non-catastrophic one can also be developed based on these encoders. Since they do not involve the state transitions of the convolutional encoder, the time complexity of these methods grows linearly with the encoder memory.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Data-Driven Reachability Analysis of Pedestrians Using Behavior Modes
Authors:
August Söderlund,
Frank J. Jiang,
Vandana Narri,
Amr Alanwar,
Karl H. Johansson
Abstract:
In this paper, we present a data-driven approach for safely predicting the future state sets of pedestrians. Previous approaches to predicting the future state sets of pedestrians either do not provide safety guarantees or are overly conservative. Moreover, an additional challenge is the selection or identification of a model that sufficiently captures the motion of pedestrians. To address these i…
▽ More
In this paper, we present a data-driven approach for safely predicting the future state sets of pedestrians. Previous approaches to predicting the future state sets of pedestrians either do not provide safety guarantees or are overly conservative. Moreover, an additional challenge is the selection or identification of a model that sufficiently captures the motion of pedestrians. To address these issues, this paper introduces the idea of splitting previously collected, historical pedestrian trajectories into different behavior modes for performing data-driven reachability analysis. Through this proposed approach, we are able to use data-driven reachability analysis to capture the future state sets of pedestrians, while being less conservative and still maintaining safety guarantees. Furthermore, this approach is modular and can support different approaches for behavior splitting. To illustrate the efficacy of the approach, we implement our method with a basic behavior-splitting module and evaluate the implementation on an open-source data set of real pedestrian trajectories. In this evaluation, we find that the modal reachable sets are less conservative and more descriptive of the future state sets of the pedestrian.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Authors:
Yusheng Dai,
Hang Chen,
Jun Du,
Xiaofei Ding,
Ning Ding,
Feijun Jiang,
Chin-Hui Lee
Abstract:
In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve a…
▽ More
In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework. First, we explore the correlation between lip shapes and syllable-level subword units in Mandarin to establish good frame-level syllable boundaries from lip shapes. This enables accurate alignment of video and audio streams during visual model pre-training and cross-modal fusion. Next, we propose an audio-guided cross-modal fusion encoder (CMFE) neural network to utilize main training parameters for multiple cross-modal attention layers to make full use of modality complementarity. Experiments on the MISP2021-AVSR data set show the effectiveness of the two proposed techniques. Together, using only a relatively small amount of training data, the final system achieves better performances than state-of-the-art systems with more complex front-ends and back-ends.
△ Less
Submitted 8 March, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Polynomial Logical Zonotopes: A Set Representation for Reachability Analysis of Logical Systems
Authors:
Amr Alanwar,
Frank J. Jiang,
Karl H. Johansson
Abstract:
In this paper, we introduce a set representation called polynomial logical zonotopes for performing exact and computationally efficient reachability analysis on logical systems. Polynomial logical zonotopes are a generalization of logical zonotopes, which are able to represent up to 2^n binary vectors using only n generators. Due to their construction, logical zonotopes are only able to support ex…
▽ More
In this paper, we introduce a set representation called polynomial logical zonotopes for performing exact and computationally efficient reachability analysis on logical systems. Polynomial logical zonotopes are a generalization of logical zonotopes, which are able to represent up to 2^n binary vectors using only n generators. Due to their construction, logical zonotopes are only able to support exact computations of some logical operations (XOR, NOT, XNOR), while other operations (AND, NAND, OR, NOR) result in over-approximations in the reduced space, i.e., generator space. In order to perform all fundamental logical operations exactly, we formulate a generalization of logical zonotopes that is constructed by dependent generators and exponent matrices. We prove that through this polynomial-like construction, we are able to perform all of the fundamental logical operations (XOR, NOT, XNOR, AND, NAND, OR, NOR) exactly in the generator space. While we are able to perform all of the logical operations exactly, this comes with a slight increase in computational complexity compared to logical zonotopes. We show that we can use polynomial logical zonotopes to perform exact reachability analysis while retaining a low computational complexity. To illustrate and showcase the computational benefits of polynomial logical zonotopes, we present the results of performing reachability analysis on two use cases: (1) safety verification of an intersection crossing protocol and (2) reachability analysis on a high-dimensional Boolean function. Moreover, to highlight the extensibility of logical zonotopes, we include an additional use case where we perform a computationally tractable exhaustive search for the key of a linear feedback shift register.
△ Less
Submitted 1 March, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Multi-Scale Simulation of Complex Systems: A Perspective of Integrating Knowledge and Data
Authors:
Huandong Wang,
Huan Yan,
Can Rong,
Yuan Yuan,
Fenyu Jiang,
Zhenyu Han,
Hongjie Sui,
Depeng **,
Yong Li
Abstract:
Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will syste…
▽ More
Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will systematically review the literature on multi-scale simulation of complex systems from the perspective of knowledge and data. Firstly, we will present background knowledge about simulating complex system simulation and the scales in complex systems. Then, we divide the main objectives of multi-scale modeling and simulation into five categories by considering scenarios with clear scale and scenarios with unclear scale, respectively. After summarizing the general methods for multi-scale simulation based on the clues of knowledge and data, we introduce the adopted methods to achieve different objectives. Finally, we introduce the applications of multi-scale simulation in typical matter systems and social systems.
△ Less
Submitted 17 June, 2023;
originally announced June 2023.
-
Hierarchical Interactive Reconstruction Network For Video Compressive Sensing
Authors:
Tong Zhang,
Wenxue Cui,
Chen Hui,
Feng Jiang
Abstract:
Deep network-based image and video Compressive Sensing(CS) has attracted increasing attentions in recent years. However, in the existing deep network-based CS methods, a simple stacked convolutional network is usually adopted, which not only weakens the perception of rich contextual prior knowledge, but also limits the exploration of the correlations between temporal video frames. In this paper, w…
▽ More
Deep network-based image and video Compressive Sensing(CS) has attracted increasing attentions in recent years. However, in the existing deep network-based CS methods, a simple stacked convolutional network is usually adopted, which not only weakens the perception of rich contextual prior knowledge, but also limits the exploration of the correlations between temporal video frames. In this paper, we propose a novel Hierarchical InTeractive Video CS Reconstruction Network(HIT-VCSNet), which can cooperatively exploit the deep priors in both spatial and temporal domains to improve the reconstruction quality. Specifically, in the spatial domain, a novel hierarchical structure is designed, which can hierarchically extract deep features from keyframes and non-keyframes. In the temporal domain, a novel hierarchical interaction mechanism is proposed, which can cooperatively learn the correlations among different frames in the multiscale space. Extensive experiments manifest that the proposed HIT-VCSNet outperforms the existing state-of-the-art video and image CS methods in a large margin.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Estimating Continuous Muscle Fatigue For Multi-Muscle Coordinated Exercise: A Pilot Study
Authors:
Chunzhi Yi,
Baichun Wei,
Wei **,
Jianfei Zhu,
Seungmin Rho,
Zhiyuan Chen,
Feng Jiang
Abstract:
Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and…
▽ More
Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and the estimator that captures the time-evolving progression of fatigue. In this paper, we propose to depict fatigue by the features of muscle compensation and spinal module activation changes and estimate continuous fatigue by a physiological rationale model. First, we extract muscle synergy fractionation and the variance of spinal module spikings as features inspired by the prior of fatigue-induced neuromuscular adaptations. Second, we treat the features as observations and develop a Bayesian Gaussian process to capture the time-evolving progression. Third, we solve the issue of lacking supervision information by mathematically formulating the time-evolving characteristics of fatigue as the loss function. Finally, we adapt the metrics that follow the physiological principles of fatigue to quantitatively evaluate the performance. Our extensive experiments present a 0.99 similarity between days, a over 0.7 similarity with other views of fatigue and a nearly 1 weak monotonicity, which outperform other methods. This study would aim the objective assessment of muscle fatigue.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Experimental Validation of Single BS 5G mmWave Positioning and Map** for Intelligent Transport
Authors:
Yu Ge,
Hedieh Khosravi,
Fan Jiang,
Hui Chen,
Simon Lindberg,
Peter Hammarberg,
Hyowon Kim,
Oliver Brunnegård,
Olof Eriksson,
Bengt-Erik Olsson,
Fredrik Tufvesson,
Lennart Svensson,
Henk Wymeersch
Abstract:
Positioning with 5G signals generally requires connection to several base stations (BSs), which makes positioning more demanding in terms of infrastructure than communications. To address this issue, there have been several theoretical studies on single BS positioning, leveraging high-resolution angle and delay estimation and multipath exploitation possibilities at mmWave frequencies. This paper p…
▽ More
Positioning with 5G signals generally requires connection to several base stations (BSs), which makes positioning more demanding in terms of infrastructure than communications. To address this issue, there have been several theoretical studies on single BS positioning, leveraging high-resolution angle and delay estimation and multipath exploitation possibilities at mmWave frequencies. This paper presents the first realistic experimental validation of such studies, involving a commercial 5G mmWave BS and a user equipment (UE) development kit mounted on a test vehicle. We present the relevant signal models, signal processing methods (including channel parameter estimation and position estimation), and validate these based on real data collected in an outdoor science park environment. Our results indicate that positioning is possible, but the performance with a single BS is limited by the knowledge of the position and orientation of the infrastructure and the multipath visibility and diversity.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
ESPRIT-Oriented Precoder Design for mmWave Channel Estimation
Authors:
Musa Furkan Keskin,
Alessio Fascista,
Fan Jiang,
Angelo Coluccia,
Gonzalo Seco-Granados,
Henk Wymeersch
Abstract:
We consider the problem of ESPRIT-oriented precoder design for beamspace angle-of-departure (AoD) estimation in downlink mmWave multiple-input single-output communications. Standard precoders (i.e., directional/sum beams) yield poor performance in AoD estimation, while Cramer-Rao bound-optimized precoders undermine the so-called shift invariance property (SIP) of ESPRIT. To tackle this issue, the…
▽ More
We consider the problem of ESPRIT-oriented precoder design for beamspace angle-of-departure (AoD) estimation in downlink mmWave multiple-input single-output communications. Standard precoders (i.e., directional/sum beams) yield poor performance in AoD estimation, while Cramer-Rao bound-optimized precoders undermine the so-called shift invariance property (SIP) of ESPRIT. To tackle this issue, the problem of designing ESPRIT-oriented precoders is formulated to jointly optimize over the precoding matrix and the SIP-restoring matrix of ESPRIT. We develop an alternating optimization approach that updates these two matrices under unit-modulus constraints for analog beamforming architectures. Simulation results demonstrate the validity of the proposed approach while providing valuable insights on the beampatterns of the ESPRIT-oriented precoders.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Logical Zonotopes: A Set Representation for the Formal Verification of Boolean Functions
Authors:
Amr Alanwar,
Frank J. Jiang,
Samy Amin,
Karl H. Johansson
Abstract:
A logical zonotope, which is a new set representation for binary vectors, is introduced in this paper. A logical zonotope is constructed by XOR-ing a binary vector with a combination of other binary vectors called generators. Such a zonotope can represent up to 2^n binary vectors using only n generators. It is shown that logical operations over sets of binary vectors can be performed on the zonoto…
▽ More
A logical zonotope, which is a new set representation for binary vectors, is introduced in this paper. A logical zonotope is constructed by XOR-ing a binary vector with a combination of other binary vectors called generators. Such a zonotope can represent up to 2^n binary vectors using only n generators. It is shown that logical operations over sets of binary vectors can be performed on the zonotopes' generators and, thus, significantly reduce the computational complexity of various logical operations (e.g., XOR, NAND, AND, OR, and semi-tensor products). Similar to traditional zonotopes' role in the formal verification of dynamical systems over real vector spaces, logical zonotopes can efficiently analyze discrete dynamical systems defined over binary vector spaces. We illustrate the approach and its ability to reduce the computational complexity in two use cases: (1) encryption key discovery of a linear feedback shift register and (2) safety verification of a road traffic intersection protocol.
△ Less
Submitted 26 August, 2023; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Wideband mmWave Massive MIMO Channel Estimation and Localization
Authors:
Shudi Weng,
Fan Jiang,
Henk Wymeersch
Abstract:
Spatial wideband effects are known to affect channel estimation and localization performance in millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Based on perturbation analysis, we show that the spatial wideband effect is in fact more pronounced than previously thought and significantly degrades performance, even at moderate bandwidths, if it is not properly considere…
▽ More
Spatial wideband effects are known to affect channel estimation and localization performance in millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Based on perturbation analysis, we show that the spatial wideband effect is in fact more pronounced than previously thought and significantly degrades performance, even at moderate bandwidths, if it is not properly considered in the algorithm design. We propose a novel channel estimation method based on multidimensional ESPRIT per subcarrier, combined with unsupervised learning for pairing across subcarriers, which shows significant performance gain over existing schemes under wideband conditions.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Two-Timescale Transmission Design and RIS Optimization for Integrated Localization and Communications
Authors:
Fan Jiang,
Andrea Abrardo,
Kamran Keykhoshravi,
Henk Wymeersch,
Davide Dardari,
Marco Di Renzo
Abstract:
Reconfigurable intelligent surfaces (RISs) have tremendous potential to boost communication performance, especially when the line-of-sight (LOS) path between the user equipment (UE) and base station (BS) is blocked. To control the RIS, channel state information (CSI) is needed, which entails significant pilot overhead. To reduce this overhead and the need for frequent RIS reconfiguration, we propo…
▽ More
Reconfigurable intelligent surfaces (RISs) have tremendous potential to boost communication performance, especially when the line-of-sight (LOS) path between the user equipment (UE) and base station (BS) is blocked. To control the RIS, channel state information (CSI) is needed, which entails significant pilot overhead. To reduce this overhead and the need for frequent RIS reconfiguration, we propose a novel framework for integrated localization and communications, where RIS configurations are fixed during location coherence intervals, while BS precoders are optimized every channel coherence interval. This framework leverages accurate location information obtained with the aid of several RISs as well as novel RIS optimization and channel estimation methods. Performance in terms of localization accuracy, channel estimation error, and achievable rate demonstrates the effectiveness of the proposed approach.
△ Less
Submitted 20 November, 2023; v1 submitted 6 September, 2022;
originally announced September 2022.
-
An Iterative 5G Positioning and Synchronization Algorithm in NLOS Environments with Multi-Bounce Paths
Authors:
Zhixing Li,
Fan Jiang,
Henk Wymeersch,
Fuxi Wen
Abstract:
5G positioning is a very promising area that presents many opportunities and challenges. Many existing techniques rely on multiple anchor nodes and line-of-sight (LOS) paths, or single reference node and single-bounce non-LOS (NLOS) paths. However, in dense multipath environments, identifying the LOS or single-bounce assumptions is challenging. The multi-bounce paths will make the positioning accu…
▽ More
5G positioning is a very promising area that presents many opportunities and challenges. Many existing techniques rely on multiple anchor nodes and line-of-sight (LOS) paths, or single reference node and single-bounce non-LOS (NLOS) paths. However, in dense multipath environments, identifying the LOS or single-bounce assumptions is challenging. The multi-bounce paths will make the positioning accuracy deteriorate significantly. We propose a robust 5G positioning algorithm in NLOS multipath environments. The corresponding positioning problem is formulated as an iterative and weighted least squares problem, and different weights are utilized to mitigate the effects of multi-bounce paths. Numerical simulations are carried out to evaluate the performance of the proposed algorithm. Compared with the benchmark positioning algorithms only using the single-bounce paths, similar positioning accuracy is achieved for the proposed algorithm.
△ Less
Submitted 4 September, 2022;
originally announced September 2022.
-
Doppler Exploitation in Bistatic mmWave Radio SLAM
Authors:
Yu Ge,
Ossi Kaltiokallio,
Hui Chen,
Fan Jiang,
Jukka Talvitie,
Mikko Valkama,
Lennart Svensson,
Henk Wymeersch
Abstract:
Networks in 5G and beyond utilize millimeter wave (mmWave) radio signals, large bandwidths, and large antenna arrays, which bring opportunities in jointly localizing the user equipment and map** the propagation environment, termed as simultaneous localization and map** (SLAM). Existing approaches mainly rely on delays and angles, and ignore the Doppler, although it contains geometric informati…
▽ More
Networks in 5G and beyond utilize millimeter wave (mmWave) radio signals, large bandwidths, and large antenna arrays, which bring opportunities in jointly localizing the user equipment and map** the propagation environment, termed as simultaneous localization and map** (SLAM). Existing approaches mainly rely on delays and angles, and ignore the Doppler, although it contains geometric information. In this paper, we study the benefits of exploiting Doppler in SLAM through deriving the posterior Cramér-Rao bounds (PCRBs) and formulating the extended Kalman-Poisson multi-Bernoulli sequential filtering solution with Doppler as one of the involved measurements. Both theoretical PCRB analysis and simulation results demonstrate the efficacy of utilizing Doppler.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Low-Complexity Acoustic Echo Cancellation with Neural Kalman Filtering
Authors:
Dong Yang,
Fei Jiang,
Wei Wu,
Xuefei Fang,
Muyong Cao
Abstract:
The Kalman filter has been adopted in acoustic echo cancellation due to its robustness to double-talk, fast convergence, and good steady-state performance. The performance of Kalman filter is closely related to the estimation accuracy of the state noise covariance and the observation noise covariance. The estimation error may lead to unacceptable results, especially when the echo path suffers abru…
▽ More
The Kalman filter has been adopted in acoustic echo cancellation due to its robustness to double-talk, fast convergence, and good steady-state performance. The performance of Kalman filter is closely related to the estimation accuracy of the state noise covariance and the observation noise covariance. The estimation error may lead to unacceptable results, especially when the echo path suffers abrupt changes, the tracking performance of the Kalman filter could be degraded significantly. In this paper, we propose the neural Kalman filtering (NKF), which uses neural networks to implicitly model the covariance of the state noise and observation noise and to output the Kalman gain in real-time. Experimental results on both synthetic test sets and real-recorded test sets show that, the proposed NKF has superior convergence and re-convergence performance while ensuring low near-end speech degradation comparing with the state-of-the-art model-based methods. Moreover, the model size of the proposed NKF is merely 5.3 K and the RTF is as low as 0.09, which indicates that it can be deployed in low-resource platforms.
△ Less
Submitted 29 October, 2022; v1 submitted 22 July, 2022;
originally announced July 2022.
-
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Authors:
Guangyan Zhang,
Ying Qin,
Wenjie Zhang,
Jialun Wu,
Mei Li,
Yutao Gai,
Feijun Jiang,
Tan Lee
Abstract:
The capability of generating speech with specific type of emotion is desired for many applications of human-computer interaction. Cross-speaker emotion transfer is a common approach to generating emotional speech when speech with emotion labels from target speakers is not available for model training. This paper presents a novel cross-speaker emotion transfer system, named iEmoTTS. The system is c…
▽ More
The capability of generating speech with specific type of emotion is desired for many applications of human-computer interaction. Cross-speaker emotion transfer is a common approach to generating emotional speech when speech with emotion labels from target speakers is not available for model training. This paper presents a novel cross-speaker emotion transfer system, named iEmoTTS. The system is composed of an emotion encoder, a prosody predictor, and a timbre encoder. The emotion encoder extracts the identity of emotion type as well as the respective emotion intensity from the mel-spectrogram of input speech. The emotion intensity is measured by the posterior probability that the input utterance carries that emotion. The prosody predictor is used to provide prosodic features for emotion transfer. The timber encoder provides timbre-related information for the system. Unlike many other studies which focus on disentangling speaker and style factors of speech, the iEmoTTS is designed to achieve cross-speaker emotion transfer via disentanglement between prosody and timbre. Prosody is considered as the main carrier of emotion-related speech characteristics and timbre accounts for the essential characteristics for speaker identification. Zero-shot emotion transfer, meaning that speech of target speakers are not seen in model training, is also realized with iEmoTTS. Extensive experiments of subjective evaluation have been carried out. The results demonstrate the effectiveness of iEmoTTS as compared with other recently proposed systems of cross-speaker emotion transfer. It is shown that iEmoTTS can produce speech with designated emotion type and controllable emotion intensity. With appropriate information bottleneck capacity, iEmoTTS is able to effectively transfer emotion information to a new speaker. Audio samples are publicly available https://patrick-g-zhang.github.io/iemotts/
△ Less
Submitted 4 January, 2023; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Doppler-Enabled Single-Antenna Localization and Map** Without Synchronization
Authors:
Hui Chen,
Fan Jiang,
Yu Ge,
Hyowon Kim,
Henk Wymeersch
Abstract:
Radio localization is a key enabler for joint communication and sensing in the fifth/sixth generation (5G/6G) communication systems. With the help of multipath components (MPCs), localization and map** tasks can be done with a single base station (BS) and single unsynchronized user equipment (UE) if both of them are equipped with an antenna array. However, the antenna array at the UE side increa…
▽ More
Radio localization is a key enabler for joint communication and sensing in the fifth/sixth generation (5G/6G) communication systems. With the help of multipath components (MPCs), localization and map** tasks can be done with a single base station (BS) and single unsynchronized user equipment (UE) if both of them are equipped with an antenna array. However, the antenna array at the UE side increases the hardware and computational cost, preventing localization functionality. In this work, we show that with Doppler estimation and MPCs, localization and map** tasks can be performed even with a single-antenna mobile UE. Furthermore, we show that the localization and map** performance will improve and then saturate at a certain level with an increased UE speed. Both theoretical Cramér-Rao bound analysis and simulation results show the potential of localization under mobility and the effectiveness of the proposed localization algorithm.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Music Source Separation with Generative Flow
Authors:
Ge Zhu,
Jordan Darefsky,
Fei Jiang,
Anton Selitskiy,
Zhiyao Duan
Abstract:
Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art. However, such parallel data is often difficult to obtain, and it is cumbersome to adapt trained models to mixtures with new sources. Source-only supervised models, in contrast, only require individual source data for training. In this paper, we first leverage flow-based gen…
▽ More
Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art. However, such parallel data is often difficult to obtain, and it is cumbersome to adapt trained models to mixtures with new sources. Source-only supervised models, in contrast, only require individual source data for training. In this paper, we first leverage flow-based generators to train individual music source priors and then use these models, along with likelihood-based objectives, to separate music mixtures. We show that in singing voice separation and music separation tasks, our proposed method is competitive with a fully-supervised approach. We also demonstrate that we can flexibly add new types of sources, whereas fully-supervised approaches would require retraining of the entire model.
△ Less
Submitted 16 October, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
SADN: Learned Light Field Image Compression with Spatial-Angular Decorrelation
Authors:
Kedeng Tong,
Xin **,
Chen Wang,
Fan Jiang
Abstract:
Light field image becomes one of the most promising media types for immersive video applications. In this paper, we propose a novel end-to-end spatial-angular-decorrelated network (SADN) for high-efficiency light field image compression. Different from the existing methods that exploit either spatial or angular consistency in the light field image, SADN decouples the angular and spatial informatio…
▽ More
Light field image becomes one of the most promising media types for immersive video applications. In this paper, we propose a novel end-to-end spatial-angular-decorrelated network (SADN) for high-efficiency light field image compression. Different from the existing methods that exploit either spatial or angular consistency in the light field image, SADN decouples the angular and spatial information by dilation convolution and stride convolution in spatial-angular interaction, and performs feature fusion to compress spatial and angular information jointly. To train a stable and robust algorithm, a large-scale dataset consisting of 7549 light field images is proposed and built. The proposed method provides 2.137 times and 2.849 times higher compression efficiency relative to H.266/VVC and H.265/HEVC inter coding, respectively. It also outperforms the end-to-end image compression networks by an average of 79.6% bitrate saving with much higher subjective quality and light field consistency.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Image Compressed Sensing Using Non-local Neural Network
Authors:
Wenxue Cui,
Shaohui Liu,
Feng Jiang,
Debin Zhao
Abstract:
Deep network-based image Compressed Sensing (CS) has attracted much attention in recent years. However, the existing deep network-based CS schemes either reconstruct the target image in a block-by-block manner that leads to serious block artifacts or train the deep network as a black box that brings about limited insights of image prior knowledge. In this paper, a novel image CS framework using no…
▽ More
Deep network-based image Compressed Sensing (CS) has attracted much attention in recent years. However, the existing deep network-based CS schemes either reconstruct the target image in a block-by-block manner that leads to serious block artifacts or train the deep network as a black box that brings about limited insights of image prior knowledge. In this paper, a novel image CS framework using non-local neural network (NL-CSNet) is proposed, which utilizes the non-local self-similarity priors with deep network to improve the reconstruction quality. In the proposed NL-CSNet, two non-local subnetworks are constructed for utilizing the non-local self-similarity priors in the measurement domain and the multi-scale feature domain respectively. Specifically, in the subnetwork of measurement domain, the long-distance dependencies between the measurements of different image blocks are established for better initial reconstruction. Analogically, in the subnetwork of multi-scale feature domain, the affinities between the dense feature representations are explored in the multi-scale space for deep reconstruction. Furthermore, a novel loss function is developed to enhance the coupling between the non-local representations, which also enables an end-to-end training of NL-CSNet. Extensive experiments manifest that NL-CSNet outperforms existing state-of-the-art CS methods, while maintaining fast computational speed.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Iterated Posterior Linearization PMB Filter for 5G SLAM
Authors:
Yu Ge,
Yibo Wu,
Fan Jiang,
Ossi Kaltiokallio,
Jukka Talvitie,
Mikko Valkama,
Lennart Svensson,
Henk Wymeersch
Abstract:
5G millimeter wave (mmWave) signals have inherent geometric connections to the propagation channel and the propagation environment. Thus, they can be used to jointly localize the receiver and map the propagation environment, which is termed as simultaneous localization and map** (SLAM). One of the most important tasks in the 5G SLAM is to deal with the nonlinearity of the measurement model. To s…
▽ More
5G millimeter wave (mmWave) signals have inherent geometric connections to the propagation channel and the propagation environment. Thus, they can be used to jointly localize the receiver and map the propagation environment, which is termed as simultaneous localization and map** (SLAM). One of the most important tasks in the 5G SLAM is to deal with the nonlinearity of the measurement model. To solve this problem, existing 5G SLAM approaches rely on sigma-point or extended Kalman filters, linearizing the measurement function with respect to the prior probability density function (PDF). In this paper, we study the linearization of the measurement function with respect to the posterior PDF, and implement the iterated posterior linearization filter into the Poisson multi-Bernoulli SLAM filter. Simulation results demonstrate the accuracy and precision improvements of the resulting SLAM filter.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Beamspace Multidimensional ESPRIT Approaches for Simultaneous Localization and Communications
Authors:
Fan Jiang,
Fuxi Wen,
Yu Ge,
Meifang Zhu,
Henk Wymeersch,
Fredrik Tufvesson
Abstract:
Modern wireless communication systems operating at high carrier frequencies are characterized by a high dimensionality of the underlying parameter space (including channel gains, angles, delays, and possibly Doppler shifts). Estimating these parameters is valuable for communication purposes, but also for localization and sensing, making channel estimation a critical component in any joint communic…
▽ More
Modern wireless communication systems operating at high carrier frequencies are characterized by a high dimensionality of the underlying parameter space (including channel gains, angles, delays, and possibly Doppler shifts). Estimating these parameters is valuable for communication purposes, but also for localization and sensing, making channel estimation a critical component in any joint communication and localization or sensing application. The high dimensionality make it difficult to use search-based methods such as maximum likelihood. Search-free methods such as ESPRIT provide an attractive alternative, but require a complex decomposition step in both the tensor and matrix version of ESPRIT. To mitigate this, we propose, develop, and analyze a reduced complexity beamspace ESPRIT method. Complexity is reduced both by beampace processing as well as low-complex implementation of the singular value decomposition. A novel perturbation analysis provides important insights for both channel estimation and localization performance. The proposed method is compared to the tensor ESPRIT method, in terms of channel estimation, communication, localization, and sensing performance, further validating the perturbation analysis.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Beyond 5G RIS mmWave Systems: Where Communication and Localization Meet
Authors:
Jiguang He,
Fan Jiang,
Kamran Keykhosravi,
Joonas Kokkoniemi,
Henk Wymeersch,
Markku Juntti
Abstract:
Upcoming beyond fifth generation (5G) communications systems aim at further enhancing key performance indicators and fully supporting brand new use cases by embracing emerging techniques, e.g., reconfigurable intelligent surface (RIS), integrated communication, localization, and sensing, and mmWave/THz communications. The wireless intelligence empowered by state-of-the-art artificial intelligence…
▽ More
Upcoming beyond fifth generation (5G) communications systems aim at further enhancing key performance indicators and fully supporting brand new use cases by embracing emerging techniques, e.g., reconfigurable intelligent surface (RIS), integrated communication, localization, and sensing, and mmWave/THz communications. The wireless intelligence empowered by state-of-the-art artificial intelligence techniques has been widely considered at the transceivers, and now the paradigm is deemed to be shifted to the smart control of radio propagation environment by virtue of RISs. In this article, we argue that to harness the full potential of RISs, localization and communication must be tightly coupled. This is in sharp contrast to 5G and earlier generations, where localization was a minor additional service. To support this, we first introduce the fundamentals of RIS mmWave channel modeling, followed by RIS channel state information acquisition and link establishment. Then, we deal with the connection between localization and communications, from a separate and joint perspective.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Enhancing Data-Driven Reachability Analysis using Temporal Logic Side Information
Authors:
Amr Alanwar,
Frank J. Jiang,
Maryam Sharifi,
Dimos V. Dimarogonas,
Karl H. Johansson
Abstract:
This paper presents algorithms for performing data-driven reachability analysis under temporal logic side information. In certain scenarios, the data-driven reachable sets of a robot can be prohibitively conservative due to the inherent noise in the robot's historical measurement data. In the same scenarios, we often have side information about the robot's expected motion (e.g., limits on how much…
▽ More
This paper presents algorithms for performing data-driven reachability analysis under temporal logic side information. In certain scenarios, the data-driven reachable sets of a robot can be prohibitively conservative due to the inherent noise in the robot's historical measurement data. In the same scenarios, we often have side information about the robot's expected motion (e.g., limits on how much a robot can move in a one-time step) that could be useful for further specifying the reachability analysis. In this work, we show that if we can model this side information using a signal temporal logic (STL) fragment, we can constrain the data-driven reachability analysis and safely limit the conservatism of the computed reachable sets. Moreover, we provide formal guarantees that, even after incorporating side information, the computed reachable sets still properly over-approximate the robot's future states. Lastly, we empirically validate the practicality of the over-approximation by computing constrained, data-driven reachable sets for the Small-Vehicles-for-Autonomy (SVEA) hardware platform in two driving scenarios.
△ Less
Submitted 30 March, 2022; v1 submitted 15 September, 2021;
originally announced September 2021.
-
A Computationally Efficient EK-PMBM Filter for Bistatic mmWave Radio SLAM
Authors:
Yu Ge,
Ossi Kaltiokallio,
Hyowon Kim,
Fan Jiang,
Jukka Talvitie,
Mikko Valkama,
Lennart Svensson,
Sunwoo Kim,
Henk Wymeersch
Abstract:
Millimeter wave (mmWave) signals are useful for simultaneous localization and map** (SLAM), due to their inherent geometric connection to the propagation environment and the propagation channel. To solve the SLAM problem, existing approaches rely on sigma-point or particle-based approximations, leading to high computational complexity, precluding real-time execution. We propose a novel low-compl…
▽ More
Millimeter wave (mmWave) signals are useful for simultaneous localization and map** (SLAM), due to their inherent geometric connection to the propagation environment and the propagation channel. To solve the SLAM problem, existing approaches rely on sigma-point or particle-based approximations, leading to high computational complexity, precluding real-time execution. We propose a novel low-complexity SLAM filter, based on the Poisson multi-Bernoulli mixture (PMBM) filter. It utilizes the extended Kalman (EK) first-order Taylor series based Gaussian approximation of the filtering distribution, and applies the track-oriented marginal multi-Bernoulli/Poisson (TOMB/P) algorithm to approximate the resulting PMBM as a Poisson multi-Bernoulli (PMB). The filter can account for different landmark types in radio SLAM and multiple data association hypotheses. Hence, it has an adjustable complexity/performance trade-off. Simulation results show that the developed SLAM filter can greatly reduce the computational cost, while it keeps the good performance of map** and user state estimation.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Printed Texts Tracking and Following for a Finger-Wearable Electro-Braille System Through Opto-electrotactile Feedback
Authors:
Mehdi Rahimi,
Yantao Shen,
Zhiming Liu,
Fang Jiang
Abstract:
This paper presents our recent development on a portable and refreshable text reading and sensory substitution system for the blind or visually impaired (BVI), called Finger-eye. The system mainly consists of an opto-text processing unit and a compact electro-tactile based display that can deliver text-related electrical signals to the fingertip skin through a wearable and Braille-dot patterned el…
▽ More
This paper presents our recent development on a portable and refreshable text reading and sensory substitution system for the blind or visually impaired (BVI), called Finger-eye. The system mainly consists of an opto-text processing unit and a compact electro-tactile based display that can deliver text-related electrical signals to the fingertip skin through a wearable and Braille-dot patterned electrode array and thus delivers the electro-stimulation based Braille touch sensations to the fingertip. To achieve the goal of aiding BVI to read any text not written in Braille through this portable system, in this work, a Rapid Optical Character Recognition (R-OCR) method is firstly developed for real-time processing text information based on a Fisheye imaging device mounted at the finger-wearable electro-tactile display. This allows real-time translation of printed text to electro-Braille along with natural movement of user's fingertip as if reading any Braille display or book. More importantly, an electro-tactile neuro-stimulation feedback mechanism is proposed and incorporated with the R-OCR method, which facilitates a new opto-electrotactile feedback based text line tracking control approach that enables text line following by user fingertip during reading. Multiple experiments were designed and conducted to test the ability of blindfolded participants to read through and follow the text line based on the opto-electrotactile-feedback method. The experiments show that as the result of the opto-electrotactile-feedback, the users were able to maintain their fingertip within a $2mm$ distance of the text while scanning a text line. This research is a significant step to aid the BVI users with a portable means to translate and follow to read any printed text to Braille, whether in the digital realm or physically, on any surface.
△ Less
Submitted 6 August, 2021;
originally announced September 2021.
-
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech
Authors:
Bi-Cheng Yan,
Shao-Wei Fan Jiang,
Fu-An Chao,
Berlin Chen
Abstract:
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation,…
▽ More
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.
△ Less
Submitted 9 July, 2022; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods
Authors:
Shao-Wei Fan Jiang,
Bi-Cheng Yan,
Tien-Hong Lo,
Fu-An Chao,
Berlin Chen
Abstract:
With the acceleration of globalization, more and more people are willing or required to learn second languages (L2). One of the major remaining challenges facing current mispronunciation and diagnosis (MDD) models for use in computer-assisted pronunciation training (CAPT) is to handle speech from L2 learners with a diverse set of accents. In this paper, we set out to mitigate the adverse effects o…
▽ More
With the acceleration of globalization, more and more people are willing or required to learn second languages (L2). One of the major remaining challenges facing current mispronunciation and diagnosis (MDD) models for use in computer-assisted pronunciation training (CAPT) is to handle speech from L2 learners with a diverse set of accents. In this paper, we set out to mitigate the adverse effects of accent variety in building an L2 English MDD system with end-to-end (E2E) neural models. To this end, we first propose an effective modeling framework that infuses accent features into an E2E MDD model, thereby making the model more accent-aware. Going a step further, we design and present disparate accent-aware modules to perform accent-aware modulation of acoustic features in a finer-grained manner, so as to enhance the discriminating capability of the resulting MDD model. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset show the merits of our MDD model, in comparison to some existing E2E-based strong baselines and the celebrated pronunciation scoring based method.
△ Less
Submitted 3 October, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
TENET: A Time-reversal Enhancement Network for Noise-robust ASR
Authors:
Fu-An Chao,
Shao-Wei Fan Jiang,
Bi-Cheng Yan,
Jeih-weih Hung,
Berlin Chen
Abstract:
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. Howe…
▽ More
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. However, there is no guarantee that optimizing the perceptual quality of speech will necessarily lead to improved automatic speech recognition (ASR) performance. In this study, we present TENET, a novel Time-reversal Enhancement NETwork, which leverages the transformation of an input noisy signal itself, i.e., the time-reversed version, in conjunction with the siamese network and complex dual-path transformer to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND dataset show that TENET can achieve state-of-the-art results compared to a few top-of-the-line methods in terms of both SE and ASR evaluation metrics. To demonstrate the model generalization ability, we further evaluate TENET on the test set of scenarios contaminated with unseen noise, and the results also confirm the superiority of this promising method.
△ Less
Submitted 14 September, 2021; v1 submitted 3 July, 2021;
originally announced July 2021.
-
Cooperative mmWave PHD-SLAM with Moving Scatterers
Authors:
Hyowon Kim,
Jaebok Lee,
Yu Ge,
Fan Jiang,
Sunwoo Kim,
Henk Wymeersch
Abstract:
Using the multiple-model (MM) probability hypothesis density (PHD) filter, millimeter wave (mmWave) radio simultaneous localization and map** (SLAM) in vehicular scenarios is susceptible to movements of objects, in particular vehicles driving in parallel with the ego vehicle. We propose and evaluate two countermeasures to track vehicle scatterers (VSs) in mmWave radio MM-PHD-SLAM. First, locally…
▽ More
Using the multiple-model (MM) probability hypothesis density (PHD) filter, millimeter wave (mmWave) radio simultaneous localization and map** (SLAM) in vehicular scenarios is susceptible to movements of objects, in particular vehicles driving in parallel with the ego vehicle. We propose and evaluate two countermeasures to track vehicle scatterers (VSs) in mmWave radio MM-PHD-SLAM. First, locally at each vehicle, we generate and treat the VS map PHD in the context of Bayesian recursion, and modify vehicle state correction with the VS map PHD. Second, in the global map fusion process at the base station, we average the VS map PHD and upload it with self-vehicle posterior density, compute fusion weights, and prune the target with low Gaussian weight in the context of arithmetic average-based map fusion. From simulation results, the proposed cooperative mmWave radio MM-PHD-SLAM filter is shown to outperform the previous filter in VS scenarios.
△ Less
Submitted 22 June, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Detecting and Correcting IMU Movements During Joint Angle Estimation
Authors:
Chunzhi Yi,
Feng Jiang,
Baichun Wei,
Chifu Yang,
Zhen Ding,
Jubo **,
Jie Liu
Abstract:
Inertial measurement units (IMUs) increasingly function as a basic component of wearable sensor network (WSN)systems. IMU-based joint angle estimation (JAE) is a relatively typical usage of IMUs, with extensive applications. However, the issue that IMUs move with respect to their original placement during JAE is still a research gap, and limits the robustness of deploying the technique in real-wor…
▽ More
Inertial measurement units (IMUs) increasingly function as a basic component of wearable sensor network (WSN)systems. IMU-based joint angle estimation (JAE) is a relatively typical usage of IMUs, with extensive applications. However, the issue that IMUs move with respect to their original placement during JAE is still a research gap, and limits the robustness of deploying the technique in real-world application scenarios. In this study, we propose to detect and correct the IMU movement online in a relatively computationally lightweight manner. Particularly, we first experimentally investigate the influence of IMU movements. Second, we design the metrics for detecting IMU movements by mathematically formulating how the IMU movement affects the IMU measurements. Third, we determine the optimal thresholds of metrics by synthetic IMU data from a significantly amended simulation model. Finally, a correction method is proposed to correct the effects of IMU movements. We demonstrate our method on both synthetic data and real-user data. The results demonstrate our method is a promising solution to detecting and correcting IMU movements during JAE.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Optimal Spatial Signal Design for mmWave Positioning under Imperfect Synchronization
Authors:
Musa Furkan Keskin,
Fan Jiang,
Florent Munier,
Gonzalo Seco-Granados,
Henk Wymeersch
Abstract:
We consider the problem of spatial signal design for multipath-assisted mmWave positioning under limited prior knowledge on the user's location and clock bias. We propose an optimal robust design and, based on the low-dimensional precoder structure under perfect prior knowledge, a codebook-based heuristic design with optimized beam power allocation. Through numerical results, we characterize diffe…
▽ More
We consider the problem of spatial signal design for multipath-assisted mmWave positioning under limited prior knowledge on the user's location and clock bias. We propose an optimal robust design and, based on the low-dimensional precoder structure under perfect prior knowledge, a codebook-based heuristic design with optimized beam power allocation. Through numerical results, we characterize different position-error-bound (PEB) regimes with respect to clock bias uncertainty and show that the proposed low-complexity codebook-based designs outperform the conventional directional beam codebook and achieve near-optimal PEB performance for both analog and digital architectures.
△ Less
Submitted 7 February, 2022; v1 submitted 17 May, 2021;
originally announced May 2021.
-
An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems
Authors:
You Zhang,
Ge Zhu,
Fei Jiang,
Zhiyao Duan
Abstract:
Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials. In practice, however, acoustic condition variability in speech utterances may significantly degrade the performance of CM systems. In this paper, we conduct a cross-dataset study on several state-of-the-art CM systems and observe significant performance degr…
▽ More
Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials. In practice, however, acoustic condition variability in speech utterances may significantly degrade the performance of CM systems. In this paper, we conduct a cross-dataset study on several state-of-the-art CM systems and observe significant performance degradation compared with their single-dataset performance. Observing differences of average magnitude spectra of bona fide utterances across the datasets, we hypothesize that channel mismatch among these datasets is one important reason. We then verify it by demonstrating a similar degradation of CM systems trained on original but evaluated on channel-shifted data. Finally, we propose several channel robust strategies (data augmentation, multi-task learning, adversarial learning) for CM systems, and observe a significant performance improvement on cross-dataset experiments.
△ Less
Submitted 10 October, 2021; v1 submitted 3 April, 2021;
originally announced April 2021.
-
Continuous Prediction of Lower-Limb Kinematics From Multi-Modal Biomedical Signals
Authors:
Chunzhi Yi,
Feng Jiang,
Sheng** Zhang,
Hao Guo,
Chifu Yang,
Zhen Ding,
Baichun Wei,
Xiangyuan Lan,
Huiyu Zhou
Abstract:
The fast-growing techniques of measuring and fusing multi-modal biomedical signals enable advanced motor intent decoding schemes of lowerlimb exoskeletons, meeting the increasing demand for rehabilitative or assistive applications of take-home healthcare. Challenges of exoskeletons motor intent decoding schemes remain in making a continuous prediction to compensate for the hysteretic response caus…
▽ More
The fast-growing techniques of measuring and fusing multi-modal biomedical signals enable advanced motor intent decoding schemes of lowerlimb exoskeletons, meeting the increasing demand for rehabilitative or assistive applications of take-home healthcare. Challenges of exoskeletons motor intent decoding schemes remain in making a continuous prediction to compensate for the hysteretic response caused by mechanical transmission. In this paper, we solve this problem by proposing an ahead of time continuous prediction of lower limb kinematics, with the prediction of knee angles during level walking as a case study. Firstly, an end-to-end kinematics prediction network(KinPreNet), consisting of a feature extractor and an angle predictor, is proposed and experimentally compared with features and methods traditionally used in ahead-of-time prediction of gait phases. Secondly, inspired by the electromechanical delay(EMD), we further explore our algorithm's capability of compensating response delay of mechanical transmission by validating the performance of the different sections of prediction time. And we experimentally reveal the time boundary of compensating the hysteretic response. Thirdly, a comparison of employing EMG signals or not is performed to reveal the EMG and kinematic signals collaborated contributions to the continuous prediction. During the experiments, EMG signals of nine muscles and knee angles calculated from inertial measurement unit (IMU) signals are recorded from ten healthy subjects. To the best of our knowledge, this is the first study of continuously predicting lower-limb kinematics in an ahead-of-time manner based on the electromechanical delay (EMD).
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Online Control Synthesis for Uncertain Systems under Signal Temporal Logic Specifications
Authors:
Pian Yu,
Yulong Gao,
Frank J. Jiang,
Karl H. Johansson,
Dimos V. Dimarogonas
Abstract:
This paper studies the online control synthesis problem for uncertain discrete-time systems subject to signal temporal logic (STL) specifications. Different from existing techniques, this work proposes an approach based on STL, reachability analysis, and temporal logic trees. Firstly, a real-time version of STL semantics and a tube-based temporal logic tree (tTLT) are proposed. We show that the tT…
▽ More
This paper studies the online control synthesis problem for uncertain discrete-time systems subject to signal temporal logic (STL) specifications. Different from existing techniques, this work proposes an approach based on STL, reachability analysis, and temporal logic trees. Firstly, a real-time version of STL semantics and a tube-based temporal logic tree (tTLT) are proposed. We show that the tTLT is an underapproximation for the STL formula, in the sense that a trajectory satisfying an tTLT also satisfies the corresponding STL formula. Secondly, an online control synthesis algorithm is designed. It is shown that when the STL formula is robustly satisfiable and the initial state of the system belongs to the initial root node of the tTLT, it is guaranteed that the trajectory generated by the control synthesis algorithm satisfies the STL formula. The effectiveness of the proposed approach is verified by a simulation example and a practical experiment.
△ Less
Submitted 17 March, 2023; v1 submitted 16 March, 2021;
originally announced March 2021.
-
A Novel Hybrid Framework for Hourly PM2.5 Concentration Forecasting Using CEEMDAN and Deep Temporal Convolutional Neural Network
Authors:
Fuxin Jiang,
Chengyuan Zhang,
Shaolong Sun,
**gyun Sun
Abstract:
For hourly PM2.5 concentration prediction, accurately capturing the data patterns of external factors that affect PM2.5 concentration changes, and constructing a forecasting model is one of efficient means to improve forecasting accuracy. In this study, a novel hybrid forecasting model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and deep temporal convoluti…
▽ More
For hourly PM2.5 concentration prediction, accurately capturing the data patterns of external factors that affect PM2.5 concentration changes, and constructing a forecasting model is one of efficient means to improve forecasting accuracy. In this study, a novel hybrid forecasting model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and deep temporal convolutional neural network (DeepTCN) is developed to predict PM2.5 concentration, by modelling the data patterns of historical pollutant concentrations data, meteorological data, and discrete time variables' data. Taking PM2.5 concentration of Bei**g as the sample, experimental results showed that the forecasting accuracy of the proposed CEEMDAN-DeepTCN model is verified to be the highest when compared with the time series model, artificial neural network, and the popular deep learning models. The new model has improved the capability to model the PM2.5-related factor data patterns, and can be used as a promising tool for forecasting PM2.5 concentrations.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
One-class Learning Towards Synthetic Voice Spoofing Detection
Authors:
You Zhang,
Fei Jiang,
Zhiyao Duan
Abstract:
Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion. Recently, researchers developed anti-spoofing techniques to improve the reliability of ASV systems against spoofing attacks. However, most methods encounter difficult…
▽ More
Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion. Recently, researchers developed anti-spoofing techniques to improve the reliability of ASV systems against spoofing attacks. However, most methods encounter difficulties in detecting unknown attacks in practical use, which often have different statistical distributions from known attacks. Especially, the fast development of synthetic voice spoofing algorithms is generating increasingly powerful attacks, putting the ASV systems at risk of unseen attacks. In this work, we propose an anti-spoofing system to detect unknown synthetic voice spoofing attacks (i.e., text-to-speech or voice conversion) using one-class learning. The key idea is to compact the bona fide speech representation and inject an angular margin to separate the spoofing attacks in the embedding space. Without resorting to any data augmentation methods, our proposed system achieves an equal error rate (EER) of 2.19% on the evaluation set of ASVspoof 2019 Challenge logical access scenario, outperforming all existing single systems (i.e., those without model ensemble).
△ Less
Submitted 9 February, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.