Search | arXiv e-print repository

Switching Controller Synthesis for Hybrid Systems Against STL Formulas

Authors: Han Su, Shenghua Feng, Sinong Zhan, Naijun Zhan

Abstract: Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into t… ▽ More Switching controllers play a pivotal role in directing hybrid systems (HSs) towards the desired objective, embodying a ``correct-by-construction'' approach to HS design. Identifying these objectives is thus crucial for the synthesis of effective switching controllers. While most of existing works focus on safety and liveness, few of them consider timing constraints. In this paper, we delves into the synthesis of switching controllers for HSs that meet system objectives given by a fragment of STL, which essentially corresponds to a reach-avoid problem with timing constraints. Our approach involves iteratively computing the state sets that can be driven to satisfy the reach-avoid specification with timing constraints. This technique supports to create switching controllers for both constant and non-constant HSs. We validate our method's soundness, and confirm its relative completeness for a certain subclass of HSs. Experiment results affirms the efficacy of our approach. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.11568 [pdf, other]

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

Authors: Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

Abstract: In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade m… ▽ More In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade models. Our findings underscore the immense potential of E2E frameworks in speech neuroprosthesis, particularly as the technology behind brain-computer interfaces (BCIs) and the availability of relevant datasets continue to evolve. This work not only showcases the efficacy of combining LLMs with E2E decoding for enhancing speech neuroprosthesis but also sets a new direction for future research in BCI applications, underscoring the impact of LLMs in decoding complex neural signals for communication restoration. Code will be made available at https://github.com/FsFrancis15/BrainLLM. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.08306 [pdf, other]

Flight Path Optimization with Optimal Control Method

Authors: Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

Abstract: This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d… ▽ More This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06230 [pdf]

Fire in SRRN: Next-Gen 3D Temperature Field Reconstruction Technology

Authors: Shenxiang Feng, Xiaojian Hao, Xiaodong Huang, Pan Pei, Tong Wei, Chenyang Xu

Abstract: In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame therm… ▽ More In aerospace and energy engineering, accurate 3D combustion field temperature measurement is critical. The resolution of traditional methods based on algebraic iteration is limited by the initial voxel division. This study introduces a novel method for reconstructing three-dimensional temperature fields using the Spatial Radiation Representation Network (SRRN). This method utilizes the flame thermal radiation characteristics and differentiable rendering in graphics, and combines it with a multi-layer perceptron to achieve a functional representation of the flame temperature field. The effectiveness of SRRN is evaluated through simulated temperature field reconstruction experiments with different levels of complexity. The maximum root mean square error is 10.17, which proves the robustness of the algorithm to Gaussian noise and salt-and-pepper noise. We conducted a butane flame temperature field reconstruction experiment, and the maximum relative error between the reconstruction result and the thermocouple measurement value was 4.86%, confirming that the algorithm can achieve accurate reconstruction. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.13748 [pdf, other]

Application of Kalman Filter in Stochastic Differential Equations

Authors: Wencheng Bao, Shi Feng, Kaiwen Zhang

Abstract: In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters,… ▽ More In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters, boundary conditions, and closed-form solutions of SDEs can often be challenging. In this paper, we will discuss the application of Kalman filtering theory to SDEs, including Extended Kalman filtering and Particle Extended Kalman filtering. We will explore how to fit existing SDE systems through filtering and track the original SDEs by fitting the obtained closed-form solutions. This approach aims to gather more information about these SDEs, which could be used in various ways, such as incorporating them into parameters of data-based SDE models. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 18 pages, 14 figures

arXiv:2404.09500 [pdf]

On-chip Real-time Hyperspectral Imager with Full CMOS Resolution Enabled by Massively Parallel Neural Network

Authors: Junren Wen, Haiqi Gao, Weiming Shi, Shuaibo Feng, Lingyun Hao, Yujie Liu, Liang Xu, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

Abstract: Traditional spectral imaging methods are constrained by the time-consuming scanning process, limiting the application in dynamic scenarios. One-shot spectral imaging based on reconstruction has been a hot research topic recently and the primary challenges still lie in both efficient fabrication techniques suitable for mass production and the high-speed, high-accuracy reconstruction algorithm for r… ▽ More Traditional spectral imaging methods are constrained by the time-consuming scanning process, limiting the application in dynamic scenarios. One-shot spectral imaging based on reconstruction has been a hot research topic recently and the primary challenges still lie in both efficient fabrication techniques suitable for mass production and the high-speed, high-accuracy reconstruction algorithm for real-time spectral imaging. In this study, we introduce an innovative on-chip real-time hyperspectral imager that leverages nanophotonic film spectral encoders and a Massively Parallel Network (MP-Net), featuring a 4 * 4 array of compact, all-dielectric film units for the micro-spectrometers. Each curved nanophotonic film unit uniquely modulates incident light across the underlying 3 * 3 CMOS image sensor (CIS) pixels, enabling a high spatial resolution equivalent to the full CMOS resolution. The implementation of MP-Net, specially designed to address variability in transmittance and manufacturing errors such as misalignment and non-uniformities in thin film deposition, can greatly increase the structural tolerance of the device and reduce the preparation requirement, further simplifying the manufacturing process. Tested in varied environments on both static and moving objects, the real-time hyperspectral imager demonstrates the robustness and high-fidelity spatial-spectral data capabilities across diverse scenarios. This on-chip hyperspectral imager represents a significant advancement in real-time, high-resolution spectral imaging, offering a versatile solution for applications ranging from environmental monitoring, remote sensing to consumer electronics. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.07959 [pdf]

Damage identification of offshore jacket platforms in a digital twin framework considering optimal sensor placement

Authors: Mengmeng Wang, Atilla Incecik, Shizhe Feng, M. K. Gupta, Grzegorz Krlolczyk, Z Li

Abstract: A new digital twin (DT) framework with optimal sensor placement (OSP) is proposed to accurately calculate the modal responses and identify the damage ratios of the offshore jacket platforms. The proposed damage identification framework consists of two models (namely one OSP model and one damage identification model). The OSP model adopts the multi-objective Lichtenberg algorithm (MOLA) to perform… ▽ More A new digital twin (DT) framework with optimal sensor placement (OSP) is proposed to accurately calculate the modal responses and identify the damage ratios of the offshore jacket platforms. The proposed damage identification framework consists of two models (namely one OSP model and one damage identification model). The OSP model adopts the multi-objective Lichtenberg algorithm (MOLA) to perform the sensor number/location optimization to make a good balance between the sensor cost and the modal calculation accuracy. In the damage identification model, the Markov Chain Monte Carlo (MCMC)-Bayesian method is developed to calculate the structural damage ratios based on the modal information obtained from the sensory measurements, where the uncertainties of the structural parameters are quantified. The proposed method is validated using an offshore jacket platform, and the analysis results demonstrate efficient identification of the structural damage location and severity. △ Less

Submitted 26 March, 2024; originally announced April 2024.

arXiv:2402.19275 [pdf, other]

Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning

Authors: **gxuan Yang, Ruoxuan Bai, Haoyuan Ji, Yi Zhang, Jianming Hu, Shuo Feng

Abstract: The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prio… ▽ More The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prior knowledge can significantly diminish the evaluation efficiency. In response to this issue, existing studies predominantly concentrate on the adaptive design of testing scenarios during the CAV testing process. Yet, these methods have limitations in their applicability to high-dimensional scenarios. To overcome this challenge, we develop an adaptive testing environment that bolsters evaluation robustness by incorporating multiple surrogate models and optimizing the combination coefficients of these surrogate models to enhance evaluation efficiency. We formulate the optimization problem as a regression task utilizing quadratic programming. To efficiently obtain the regression target via reinforcement learning, we propose the dense reinforcement learning method and devise a new adaptive policy with high sample efficiency. Essentially, our approach centers on learning the values of critical scenes displaying substantial surrogate-to-real gaps. The effectiveness of our method is validated in high-dimensional overtaking scenarios, demonstrating that our approach achieves notable evaluation efficiency. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.01795 [pdf, other]

Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity

Authors: Shu Li, **gxuan Yang, Honglin He, Yi Zhang, Jianming Hu, Shuo Feng

Abstract: Testing and evaluating the safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. Practically, the number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. With the restrictions imposed by strictly restricted numbers of tests, existing testing methods often lead to significant uncertaint… ▽ More Testing and evaluating the safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. Practically, the number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. With the restrictions imposed by strictly restricted numbers of tests, existing testing methods often lead to significant uncertainty or difficulty to quantifying evaluation results. In this paper, we formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge. To alleviate the considerable uncertainty inherent in a small testing scenario set, we frame the FST problem as an optimization problem and search for the testing scenario set based on neighborhood coverage and similarity. Specifically, under the guidance of better generalization ability of the testing scenario set on AVs, we dynamically adjust this set and the contribution of each testing scenario to the evaluation result based on coverage, leveraging the prior information of surrogate models (SMs). With certain hypotheses on SMs, a theoretical upper bound of evaluation error is established to verify the sufficiency of evaluation accuracy within the given limited number of tests. The experiment results on cut-in scenarios demonstrate a notable reduction in evaluation error and variance of our method compared to conventional testing methods, especially for situations with a strict limit on the number of scenarios. △ Less

Submitted 22 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.15416 [pdf, other]

On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains

Authors: Hao Wu, Shenghua Feng, Ting Gan, Jie Wang, Bican Xia, Naijun Zhan

Abstract: Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requir… ▽ More Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requires all variables to be bounded, i.e., systems are defined over bounded domains. For systems over unbounded domains, unfortunately, existing methods become incomplete and may fail to identify potential barrier certificates. In this paper, we address this limitation for the unbounded cases. We first give a complete characterization of polynomial barrier certificates by using homogenization, a recent technique in the optimization community to reduce an unbounded optimization problem to a bounded one. Furthermore, motivated by this formulation, we introduce the definition of homogenized systems and propose a complete characterization of a family of non-polynomial barrier certificates with more expressive power. Experimental results demonstrate that our two approaches are more effective while maintaining a comparable level of efficiency. △ Less

Submitted 26 April, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 18 pages, 1 figure

arXiv:2311.07418 [pdf, other]

Speech-based Slot Filling using Large Language Models

Authors: Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Abstract: Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filli… ▽ More Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filling with noisy ASR transcriptions. Moreover, a linearised knowledge injection (LKI) scheme is also proposed to integrate dynamic external knowledge into LLMs. Experiments were performed on SLURP to quantify the performance of LLMs, including GPT-3.5-turbo, GPT-4, LLaMA-13B and Vicuna-13B (v1.1 and v1.5) with different ASR error rates. The use of the proposed fine-tuning together with the LKI scheme for LLaMA-13B achieved an 8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline system on a limited data setup. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.00263 [pdf, other]

The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks

Authors: Shuai Feng, Maopeng Ran, Baoyong Zhang, Lihua Xie, Shengyuan Xu

Abstract: In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning… ▽ More In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning the inter-follower quantized controller, one cannot improve the resilience beyond a level determined by the data rate of leader-follower quantized communication, i.e., the ceiling effect. Otherwise, overflow of followers' state quantizer can occur. On the other hand, if one selects a "large" data rate for leader-follower quantized communication, then the inter-follower quantized communication determines the resilience, and further increasing the data rate for leader-follower quantized communication cannot improve the resilience, i.e., the bottleneck effect. Simulation examples are provided to justify the results of our paper. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2309.05908 [pdf, other]

Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems

Authors: Han Su, Jiyu Zhu, Shenghua Feng, Yunjun Bai, Bin Gu, Jiang Liu, Mengfei Yang, Naijun Zhan

Abstract: A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid… ▽ More A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid systems. However, time-delay is an inevitable factor in hybrid systems, which can degrade control performance and render verification certificates obtained by abstracting away time-delay invalid in practice. In this paper, we investigate this issue in a practical manner by taking time-delay into account. We propose an approach that reduces the synthesis of reset controllers to the generation of reach-avoid sets for the hybrid system under consideration, which can be efficiently solved using off-the-shell convex optimization solvers. △ Less

Submitted 27 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 15 pages, 10 figures

arXiv:2308.12617 [pdf, ps, other]

Quantized distributed Nash equilibrium seeking under DoS attacks: A quantized consensus based approach

Authors: Shuai Feng, Maojiao Ye, Lihua Xie, Shengyuan Xu

Abstract: This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized con… ▽ More This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized consensus. To solve the quantizer saturation problem caused by DoS attacks, the quantization mechanism is equipped to have zooming-in and holding capabilities, in which the holding capability is consistent with the results in quantized consensus under DoS. A sufficient condition on the number of quantizer levels is provided, under which the quantizers are free from saturation under DoS attacks. The proposed distributed quantized NE seeking strategy is shown to have the so-called maximum resilience to DoS attacks. Namely, if the bound characterizing the maximum resilience is violated, an attacker can deny all the transmissions and hence distributed NE seeking is impossible. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.15388 [pdf, other]

An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion

Authors: Peng **, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin

Abstract: This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained… ▽ More This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on OpenFWI, a collection of large-scale, multi-structural, synthetic datasets published recently. In particular, we train and evaluate the FWI models on a combination of 10 2D subsets in OpenFWI that contain 470K pairs of seismic data and velocity maps in total. Our experiments demonstrate that training on the combined dataset yields an average improvement of 13.03% in MAE, 7.19% in MSE and 1.87% in SSIM compared to each split dataset, and an average improvement of 28.60%, 21.55% and 8.22% in the leave-one-out generalization test. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement, where our largest model yields an average improvement of 20.06%, 13.39% and 0.72% compared to the smallest one. △ Less

Submitted 24 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

arXiv:2306.02982 [pdf, other]

PolyVoice: Language Models for Speech to Speech Translation

Authors: Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yu** Wang, Mingxuan Wang, Yuxuan Wang

Abstract: We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt… ▽ More We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model. This grants our framework the ability to preserve the voice characteristics and the speaking style of the original speech. We examine our system on Chinese $\rightarrow$ English and English $\rightarrow$ Spanish pairs. Experimental results show that our system can generate speech with high translation quality and audio quality. Speech samples are available at https://speechtranslation.github.io/polyvoice. △ Less

Submitted 13 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.00279 [pdf, other]

doi 10.1109/TCNS.2023.3281555

Dynamic quantized consensus under DoS attacks: Towards a tight zooming-out factor

Authors: Shuai Feng, Maopeng Ran, Hideaki Ishii, Shengyuan Xu

Abstract: This paper deals with dynamic quantized consensus of dynamical agents in a general form under packet losses induced by Denial-of-Service (DoS) attacks. The communication channel has limited bandwidth and hence the transmitted signals over the network are subject to quantization. To deal with agent's output, an observer is implemented at each node. The state of the observer is quantized by a finite… ▽ More This paper deals with dynamic quantized consensus of dynamical agents in a general form under packet losses induced by Denial-of-Service (DoS) attacks. The communication channel has limited bandwidth and hence the transmitted signals over the network are subject to quantization. To deal with agent's output, an observer is implemented at each node. The state of the observer is quantized by a finite-level quantizer and then transmitted over the network. To solve the problem of quantizer overflow under malicious packet losses, a zooming-in and out dynamic quantization mechanism is designed. By the new quantized controller proposed in the paper, the zooming-out factor is lower bounded by the spectral radius of the agent's dynamic matrix. A sufficient condition of quantization range is provided under which the finite-level quantizer is free of overflow. A sufficient condition of tolerable DoS attacks for achieving consensus is also provided. At last, we study scalar dynamical agents as a special case and further tighten the zooming-out factor to a value smaller than the agent's dynamic parameter. Under such a zooming-out factor, it is possible to recover the level of tolerable DoS attacks to that of unquantized consensus, and the quantizer is free of overflow. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.15719 [pdf, other]

Efficient Neural Music Generation

Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real-time generation. Efficient music generation with a quality on par with MusicLM remains a significant challenge. In this paper, we present MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling 10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM for semantic modeling, and applies a novel dual-path diffusion (DPD) model and an audio VAE-GAN to efficiently decode the conditioning semantic tokens into waveform. DPD is proposed to simultaneously model the coarse and fine acoustics by incorporating the semantic information into segments of latents effectively via cross-attention at each denoising step. Our experimental results suggest the superiority of MeLoDy, not only in its practical advantages on sampling speed and infinitely continuable generation, but also in its state-of-the-art musicality, audio quality, and text correlation. Our samples are available at https://Efficient-MeLoDy.github.io/. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.13314 [pdf, other]

Auto-Linear Phenomenon in Subsurface Imaging

Authors: Yinan Feng, Yinpeng Chen, Peng **, Shihang Feng, Zicheng Liu, Youzuo Lin

Abstract: Subsurface imaging involves solving full waveform inversion (FWI) to predict geophysical properties from measurements. This problem can be reframed as an image-to-image translation, with the usual approach being to train an encoder-decoder network using paired data from two domains: geophysical property and measurement. A recent seminal work (InvLINT) demonstrates there is only a linear map** be… ▽ More Subsurface imaging involves solving full waveform inversion (FWI) to predict geophysical properties from measurements. This problem can be reframed as an image-to-image translation, with the usual approach being to train an encoder-decoder network using paired data from two domains: geophysical property and measurement. A recent seminal work (InvLINT) demonstrates there is only a linear map** between the latent spaces of the two domains, and the decoder requires paired data for training. This paper extends this direction by demonstrating that only linear map** necessitates paired data, while both the encoder and decoder can be learned from their respective domains through self-supervised learning. This unveils an intriguing phenomenon (named Auto-Linear) where the self-learned features of two separate domains are automatically linearly correlated. Compared with existing methods, our Auto-Linear has four advantages: (a) solving both forward and inverse modeling simultaneously, (b) applicable to different subsurface imaging tasks and achieving markedly better results than previous methods, (c)enhanced performance, especially in scenarios with limited paired data and in the presence of noisy data, and (d) strong generalization ability of the trained encoder and decoder. △ Less

Submitted 21 May, 2024; v1 submitted 27 April, 2023; originally announced May 2023.

arXiv:2305.11576 [pdf, other]

Language-universal phonetic encoder for low-resource speech recognition

Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

Abstract: Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phon… ▽ More Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phonetic model to improve low-resource ASR performances, for the first time within the attention encoder-decoder architecture. We propose an adaptation method on the phonetic IPA model to further improve the proposed approach on extreme low-resource languages. Experiments carried out on the open-source MLS corpus and our internal databases show our approach outperforms baseline monolingual models and most state-of-the-art works. Our main approach and adaptation are effective on extremely low-resource languages, even within domain- and language-mismatched scenarios. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: Accepted for publication in INTERSPEECH 2023

arXiv:2305.11569 [pdf, ps, other]

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Mult… ▽ More We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Multilingual Speech (MLS) Corpus show that the proposed approach consistently outperforms the standard HuBERT on all the target languages. Moreover, on 3 of the 4 languages, comparing to the standard HuBERT, the approach performs better, meanwhile is able to save supervised training data by 1.5k hours (75%) at most. Our approach outperforms most of the state of the arts, with much less pretraining data in terms of hours and language diversity. Compared to XLSR-53 and a retraining based multilingual method, our approach performs better with full and limited finetuning data scenarios. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: Accepted for publication in INTERSPEECH 2023

arXiv:2303.14278 [pdf, other]

Safe Hierarchical Navigation in Crowded Dynamic Uncertain Environments

Authors: Hongyi Chen, Shiyu Feng, Ye Zhao, Changliu Liu, Patricio A. Vela

Abstract: This paper describes a hierarchical solution consisting of a multi-phase planner and a low-level safe controller to jointly solve the safe navigation problem in crowded, dynamic, and uncertain environments. The planner employs dynamic gap analysis and trajectory optimization to achieve collision avoidance with respect to the predicted trajectories of dynamic agents within the sensing and planning… ▽ More This paper describes a hierarchical solution consisting of a multi-phase planner and a low-level safe controller to jointly solve the safe navigation problem in crowded, dynamic, and uncertain environments. The planner employs dynamic gap analysis and trajectory optimization to achieve collision avoidance with respect to the predicted trajectories of dynamic agents within the sensing and planning horizon and with robustness to agent uncertainty. To address uncertainty over the planning horizon and real-time safety, a fast reactive safe set algorithm (SSA) is adopted, which monitors and modifies the unsafe control during trajectory tracking. Compared to other existing methods, our approach offers theoretical guarantees of safety and achieves collision-free navigation with higher probability in uncertain environments, as demonstrated in scenarios with 20 and 50 dynamic agents. Project website: https://hychen-naza.github.io/projects/HDAGap/. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.08243 [pdf, other]

Safer Gap: A Gap-based Local Planner for Safe Navigation with Nonholonomic Mobile Robots

Authors: Shiyu Feng, Ahmad Abuaish, Patricio A. Vela

Abstract: This paper extends the gap-based navigation technique in Potential Gap by guaranteeing safety for nonholonomic robots for all tiers of the local planner hierarchy, so called Safer Gap. The first tier generates a Bezier-based collision-free path through gaps. A subset of navigable free-space from the robot through a gap, called the keyhole, is defined to be the union of the largest collision-free d… ▽ More This paper extends the gap-based navigation technique in Potential Gap by guaranteeing safety for nonholonomic robots for all tiers of the local planner hierarchy, so called Safer Gap. The first tier generates a Bezier-based collision-free path through gaps. A subset of navigable free-space from the robot through a gap, called the keyhole, is defined to be the union of the largest collision-free disc centered on the robot and a trapezoidal region directed through the gap. It is encoded by a shallow neural network zeroing barrier function (ZBF). Nonlinear model predictive control (NMPC), with Keyhole ZBF constraints and output tracking of the Bezier path, synthesizes a safe kinematically-feasible trajectory. Low-level use of the Keyhole ZBF within a point-wise optimization-based safe control synthesis module serves as a final safety layer. Simulation and experimental validation of Safer Gap confirm its collision-free navigation properties. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Submitted to IROS 2023

arXiv:2302.08010 [pdf, other]

Achieving Covert Communication in Large-Scale SWIPT-Enabled D2D Networks

Authors: Shaohan Feng, Xiao Lu, Dusit Niyato, Ekram Hossain, Sumei Sun

Abstract: We aim to secure a large-scale device-to-device (D2D) network against adversaries. The D2D network underlays a downlink cellular network to reuse the cellular spectrum and is enabled for simultaneous wireless information and power transfer (SWIPT). In the D2D network, the transmitters communicate with the receivers, and the receivers extract information and energy from their received radio-frequen… ▽ More We aim to secure a large-scale device-to-device (D2D) network against adversaries. The D2D network underlays a downlink cellular network to reuse the cellular spectrum and is enabled for simultaneous wireless information and power transfer (SWIPT). In the D2D network, the transmitters communicate with the receivers, and the receivers extract information and energy from their received radio-frequency (RF) signals. In the meantime, the adversaries aim to detect the D2D transmission. The D2D network applies power control and leverages the cellular signal to achieve covert communication (i.e., hide the presence of transmissions) so as to defend against the adversaries. We model the interaction between the D2D network and adversaries by using a two-stage Stackelberg game. Therein, the adversaries are the followers minimizing their detection errors at the lower stage and the D2D network is the leader maximizing its network utility constrained by the communication covertness and power outage at the upper stage. Both power splitting (PS)-based and time switch (TS)-based SWIPT schemes are explored. We characterize the spatial configuration of the large-scale D2D network, adversaries, and cellular network by stochastic geometry. We analyze the adversary's detection error minimization problem and adopt the Rosenbrock method to solve it, where the obtained solution is the best response from the lower stage. Taking into account the best response from the lower stage, we develop a bi-level algorithm to solve the D2D network's constrained network utility maximization problem and obtain the Stackelberg equilibrium. We present numerical results to reveal interesting insights. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.01745 [pdf, other]

Covert D2D Communication Underlaying Cellular Network: A System-Level Security Perspective

Authors: Shaohan Feng, Xiao Lu, Kun Zhu, Dusit Niyato, ** Wang

Abstract: In this paper, we aim to secure the D2D communication of the D2D-underlaid cellular network by leveraging covert communication to hide its presence from the vigilant adversary. In particular, there are adversaries aiming to detect D2D communications based on their received signal powers. To avoid being detected, the legitimate entity, i.e., D2D-underlaid cellular network, performs power control so… ▽ More In this paper, we aim to secure the D2D communication of the D2D-underlaid cellular network by leveraging covert communication to hide its presence from the vigilant adversary. In particular, there are adversaries aiming to detect D2D communications based on their received signal powers. To avoid being detected, the legitimate entity, i.e., D2D-underlaid cellular network, performs power control so as to hide the presence of the D2D communication. We model the combat between the adversaries and the legitimate entity as a two-stage Stackelberg game. Therein, the adversaries are the followers and aim to minimize their detection errors at the lower stage while the legitimate entity is the leader and aims to maximize its utility constrained by the D2D communication covertness and the cellular quality of service (QoS) at the upper stage. Different from the conventional works, the study of the combat is conducted from the system-level perspective, where the scenario that a large-scale D2D-underlaid cellular network threatened by massive spatially distributed adversaries is considered and the network spatial configuration is modeled by stochastic geometry. We obtain the adversary's optimal strategy as the best response from the lower stage and also both analytically and numerically verify its optimality. Taking into account the best response from the lower stage, we design a bi-level algorithm based on the successive convex approximation (SCA) method to search for the optimal strategy of the legitimate entity, which together with the best response from the lower stage constitute the Stackelberg equilibrium. Numerical results are presented to evaluate the network performance and reveal practical insights that instead of improving the legitimate utility by strengthening the D2D link reliability, increasing D2D transmission power will degrade it due to the security concern. △ Less

Submitted 27 January, 2023; originally announced February 2023.

arXiv:2212.08391 [pdf, ps, other]

Enhanced-rate Iterative Beamformers for Active IRS-assisted Wireless Communications

Authors: Yeqing Lin, Feng Shu, Rongen Dong, Riqing Chen, Siling Feng, Wei** Shi, **g Liu, Jiangzhou Wang

Abstract: Compared to passive intelligent reflecting surface (IRS), active IRS is viewed as a more efficient promising technique to combat the double-fading impact in IRS-aided wireless network. In this paper, in order to boost the achievable rate of user in such a wireless network, three enhanced-rate iterative beamforming methods are proposed by designing the amplifying factors and the corresponding phase… ▽ More Compared to passive intelligent reflecting surface (IRS), active IRS is viewed as a more efficient promising technique to combat the double-fading impact in IRS-aided wireless network. In this paper, in order to boost the achievable rate of user in such a wireless network, three enhanced-rate iterative beamforming methods are proposed by designing the amplifying factors and the corresponding phases at active IRS. The first method, maximizing the simplified signal-to-noise ratio (Max-SSNR) is designed by omitting the cross-term in the definition of rate. Using the Rayleigh-Ritz (RR) theorem, Max-SSNR-RR is proposed to iteratively optimize the norm of beamforming vector and its associated normalized vector. In addition, generalized maximum ratio reflection (GMRR) is presented with a closed-form expression, which is motivated by the maximum ratio combining. To further improve rate, maximizing SNR (Max-SNR) is designed by fractional programming (FP), which is called Max-SNR-FP. Simulation results show that the proposed three methods make an obvious rate enhancement over Max-reflecting signal-to-noise ratio (Max-RSNR), maximum ratio reflection (MRR), selective ratio reflecting (SRR), equal gain reflection (EGR) and passive IRS, and are in increasing order of rate performance as follows: Max-SSNR-RR, GMRR, and Max-SNR-FP. △ Less

Submitted 14 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2212.00517 [pdf, other]

doi 10.1109/TITS.2023.3317078

Adaptive Safety Evaluation for Connected and Automated Vehicles with Sparse Control Variates

Authors: **gxuan Yang, Haowei Sun, Honglin He, Yi Zhang, Shuo Feng, Henry X. Liu

Abstract: Safety performance evaluation is critical for develo** and deploying connected and automated vehicles (CAVs). One prevailing way is to design testing scenarios using prior knowledge of CAVs, test CAVs in these scenarios, and then evaluate their safety performances. However, significant differences between CAVs and prior knowledge could severely reduce the evaluation efficiency. Towards addressin… ▽ More Safety performance evaluation is critical for develo** and deploying connected and automated vehicles (CAVs). One prevailing way is to design testing scenarios using prior knowledge of CAVs, test CAVs in these scenarios, and then evaluate their safety performances. However, significant differences between CAVs and prior knowledge could severely reduce the evaluation efficiency. Towards addressing this issue, most existing studies focus on the adaptive design of testing scenarios during the CAV testing process, but so far they cannot be applied to high-dimensional scenarios. In this paper, we focus on the adaptive safety performance evaluation by leveraging the testing results, after the CAV testing process. It can significantly improve the evaluation efficiency and be applied to high-dimensional scenarios. Specifically, instead of directly evaluating the unknown quantity (e.g., crash rates) of CAV safety performances, we evaluate the differences between the unknown quantity and known quantity (i.e., control variates). By leveraging the testing results, the control variates could be well designed and optimized such that the differences are close to zero, so the evaluation variance could be dramatically reduced for different CAVs. To handle the high-dimensional scenarios, we propose the sparse control variates method, where the control variates are designed only for the sparse and critical variables of scenarios. According to the number of critical variables in each scenario, the control variates are stratified into strata and optimized within each stratum using multiple linear regression techniques. We justify the proposed method's effectiveness by rigorous theoretical analysis and empirical study of high-dimensional overtaking scenarios. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2210.10506 [pdf, other]

Audio Tampering Detection Based on Shallow and Deep Feature Representation Learning

Authors: Zhifeng Wang, Yao Yang, Chunyan Zeng, Shuai Kong, Shixiong Feng, Nan Zhao

Abstract: Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature… ▽ More Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Audio tampering detection, 21 pages, 4 figures

arXiv:2209.15170 [pdf, other]

Securing Large-Scale D2D Networks Using Covert Communication and Friendly Jamming

Authors: Shaohan Feng, Xiao Lu, Sumei Sun, Dusit Niyato, Ekram Hossain

Abstract: We exploit both covert communication and friendly jamming to propose a friendly jamming-assisted covert communication and use it to doubly secure a large-scale device-to-device (D2D) network against eavesdroppers (i.e., wardens). The D2D transmitters defend against the wardens by: 1) hiding their transmissions with enhanced covert communication, and 2) leveraging friendly jamming to ensure informa… ▽ More We exploit both covert communication and friendly jamming to propose a friendly jamming-assisted covert communication and use it to doubly secure a large-scale device-to-device (D2D) network against eavesdroppers (i.e., wardens). The D2D transmitters defend against the wardens by: 1) hiding their transmissions with enhanced covert communication, and 2) leveraging friendly jamming to ensure information secrecy even if the D2D transmissions are detected. We model the combat between the wardens and the D2D network (the transmitters and the friendly jammers) as a two-stage Stackelberg game. Therein, the wardens are the followers at the lower stage aiming to minimize their detection errors, and the D2D network is the leader at the upper stage aiming to maximize its utility (in terms of link reliability and communication security) subject to the constraint on communication covertness. We apply stochastic geometry to model the network spatial configuration so as to conduct a system-level study. We develop a bi-level optimization algorithm to search for the equilibrium of the proposed Stackelberg game based on the successive convex approximation (SCA) method and Rosenbrock method. Numerical results reveal interesting insights. We observe that without the assistance from the jammers, it is difficult to achieve covert communication on D2D transmission. Moreover, we illustrate the advantages of the proposed friendly jamming-assisted covert communication by comparing it with the information-theoretical secrecy approach in terms of the secure communication probability and network utility. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.00196 [pdf, other]

Group frame neural network of moving object ghost imaging combined with frame merging algorithm

Authors: Da Chen, Shan-Guo Feng, Hua-Hua Wang, Jia-Ning Cao, Zhi-Wei Zhang, Zhi-Xin Yang, Ao Yan, Lu Gao, Ze Zhang

Abstract: The nature of multiple samples to extract correlation information limits the applications of ghost imaging of moving objects. A novel multi-to-one neural network is proposed and the concept of "batch frame" is introduced to improve the serial imaging method. The neural network extracts more correlation information from a small number of samples, thus reducing the sampling ratio of the ghost imagin… ▽ More The nature of multiple samples to extract correlation information limits the applications of ghost imaging of moving objects. A novel multi-to-one neural network is proposed and the concept of "batch frame" is introduced to improve the serial imaging method. The neural network extracts more correlation information from a small number of samples, thus reducing the sampling ratio of the ghost imaging technique. We combine the correlation characteristics between images to propose a frame merging algorithm, which eliminates the dynamic blur of high-speed moving objects and further improves the reconstruction quality of moving object images at a low sampling ratio. The experimental results are consistent with the simulation results. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 12 pages, 7 figures

arXiv:2208.12753 [pdf, other]

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

Authors: Chunyan Zeng, Shixiong Feng, Zhifeng Wang, Xiangkui Wan, Yunfan Chen, Nan Zhao

Abstract: The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extrac… ▽ More The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extraction of sequential Gaussian mean matrix features and construction of a recognition model based on spatio-temporal representation learning. In the feature extraction part, based on the analysis of time-series representation of recording source signals, we extract sequential Gaussian mean matrix with long-term and short-term representation ability by using the sensitivity of Gaussian mixture model to data distribution. In the model construction part, we design a structured spatio-temporal representation learning network C3D-BiLSTM to fully characterize the spatio-temporal information, combine 3D convolutional network and bidirectional long short-term memory network for short-term spectral information and long-time fluctuation information representation learning, and achieve accurate recognition of cell-phones by fusing spatio-temporal feature information of recording source signals. The method achieves an average accuracy of 99.03% for the closed-set recognition of 45 cell-phones under the CCNU\_Mobile dataset, and 98.18% in small sample size experiments, with recognition performance better than the existing state-of-the-art methods. The experimental results show that the method exhibits excellent recognition performance in multi-class cell-phones recognition. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 29 pages, 4 figures

arXiv:2207.09259 [pdf, other]

Adaptive Testing for Connected and Automated Vehicles with Sparse Control Variates in Overtaking Scenarios

Authors: **gxuan Yang, Honglin He, Yi Zhang, Shuo Feng, Henry X. Liu

Abstract: Testing and evaluation is a critical step in the development and deployment of connected and automated vehicles (CAVs). Due to the black-box property and various types of CAVs, how to test and evaluate CAVs adaptively remains a major challenge. Many approaches have been proposed to adaptively generate testing scenarios during the testing process. However, most existing approaches cannot be applied… ▽ More Testing and evaluation is a critical step in the development and deployment of connected and automated vehicles (CAVs). Due to the black-box property and various types of CAVs, how to test and evaluate CAVs adaptively remains a major challenge. Many approaches have been proposed to adaptively generate testing scenarios during the testing process. However, most existing approaches cannot be applied to complex scenarios, where the variables needed to define such scenarios are high dimensional. Towards filling this gap, the adaptive testing with sparse control variates method is proposed in this paper. Instead of adaptively generating testing scenarios, our approach evaluates CAVs' performances by adaptively utilizing the testing results. Specifically, each testing result is adjusted using multiple linear regression techniques based on control variates. As the regression coefficients can be adaptively optimized for the CAV under test, using the adjusted results can reduce the estimation variance, compared with using the testing results directly. To overcome the high dimensionality challenge, sparse control variates are utilized only for the critical variables of testing scenarios. To validate the proposed method, the high-dimensional overtaking scenarios are investigated, and the results demonstrate that our approach can further accelerate the evaluation process by about 30 times. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.08332 [pdf, other]

Quantized Consensus under Data-Rate Constraints and DoS Attacks: A Zooming-In and Holding Approach

Authors: Maopeng Ran, Shuai Feng, Juncheng Li, Lihua Xie

Abstract: This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks. The agents are modeled in strict-feedback form with unknown nonlinear dynamics and external disturbance. Extended state observers (ESOs) are leveraged to estimate agents' total uncertainties along with their states. To mitigate th… ▽ More This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks. The agents are modeled in strict-feedback form with unknown nonlinear dynamics and external disturbance. Extended state observers (ESOs) are leveraged to estimate agents' total uncertainties along with their states. To mitigate the effects of DoS attacks, a novel dynamic quantization with zooming-in and holding capabilities is proposed. The idea is to zoom-in and hold the variable to be quantized if the system is in the absence and presence of DoS attacks, respectively. The control protocol is given in terms of the outputs of the ESOs and the dynamic-quantization-based encoders and decoders. We show that, for a connected undirected network, the developed control protocol is capable of handling any DoS attacks inducing bounded consecutive packet losses with merely 3-level quantization. The application of the zooming-in and holding approach to known linear multi-agent systems is also discussed. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: 16 pages, 8 figures

arXiv:2207.01430 [pdf, ps, other]

Krasovskii and Shifted Passivity Based Output Consensus

Authors: Yu Kawano, Michele Cucuzzella, Shuai Feng, Jacquelien M. A. Scherpen

Abstract: Motivated by current sharing in power networks, we consider a class of output consensus (also called agreement) problems for nonlinear systems, where the consensus value is determined by external disturbances, e.g., power demand. This output consensus problem is solved by a simple distributed output feedback controller if a system is either Krasovskii or shifted passive, which is the only essentia… ▽ More Motivated by current sharing in power networks, we consider a class of output consensus (also called agreement) problems for nonlinear systems, where the consensus value is determined by external disturbances, e.g., power demand. This output consensus problem is solved by a simple distributed output feedback controller if a system is either Krasovskii or shifted passive, which is the only essential requirement. The effectiveness of the proposed controller is shown in simulation on an islanded DC power network. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.00105 [pdf, other]

doi 10.5121/csit.2022.120901

Deep learning pipeline for image classification on mobile phones

Authors: Muhammad Muneeb, Samuel F. Feng, Andreas Henschel

Abstract: This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and requires a systematic approach to find a model that performs optimally on both computers and mobile phones. By following the proposed pipeline, which consists… ▽ More This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and requires a systematic approach to find a model that performs optimally on both computers and mobile phones. By following the proposed pipeline, which consists of various computational tools, simple procedural recipes, and technical considerations, one can bring the power of deep learning medical image classification to mobile devices, potentially unlocking new domains of applications. The pipeline is demonstrated on four different publicly available datasets: COVID X-rays, COVID CT scans, leaves, and colorectal cancer. We used two application development frameworks: TensorFlow Lite (real-time testing) and Flutter (digital image testing) to test the proposed pipeline. We found that transferring deep learning models to a mobile phone is limited by hardware and classification accuracy drops. To address this issue, we proposed this pipeline to find an optimized model for mobile phones. Finally, we discuss additional applications and computational concerns related to deploying deep-learning models on phones, including real-time analysis and image preprocessing. We believe the associated documentation and code can help physicians and medical experts develop medical image classification applications for distribution. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 20 pages

Journal ref: 9th International Conference on Artificial Intelligence and Applications (AIAPP 2022)

arXiv:2205.12633 [pdf, other]

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemap** operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds). △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

arXiv:2204.13731 [pdf, other]

An Intriguing Property of Geophysics Inversion

Authors: Yinan Feng, Yinpeng Chen, Shihang Feng, Peng **, Zicheng Liu, Youzuo Lin

Abstract: Inversion techniques are widely used to reconstruct subsurface physical properties (e.g., velocity, conductivity) from surface-based geophysical measurements (e.g., seismic, electric/magnetic (EM) data). The problems are governed by partial differential equations (PDEs) like the wave or Maxwell's equations. Solving geophysical inversion problems is challenging due to the ill-posedness and high com… ▽ More Inversion techniques are widely used to reconstruct subsurface physical properties (e.g., velocity, conductivity) from surface-based geophysical measurements (e.g., seismic, electric/magnetic (EM) data). The problems are governed by partial differential equations (PDEs) like the wave or Maxwell's equations. Solving geophysical inversion problems is challenging due to the ill-posedness and high computational cost. To alleviate those issues, recent studies leverage deep neural networks to learn the inversion map**s from measurements to the property directly. In this paper, we show that such a map** can be well modeled by a very shallow (but not wide) network with only five layers. This is achieved based on our new finding of an intriguing property: a near-linear relationship between the input and output, after applying integral transform in high dimensional space. In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels. Furthermore, this property can be easily turned into a light-weight encoder-decoder network for inversion. The encoder contains the integration of seismic data and the linear transformation without need for fine-tuning. The decoder only consists of a single transformer block to reverse the integral of velocity. Experiments show that this interesting property holds for two geophysics inversion problems over four different datasets. Compared to much deeper InversionNet, our method achieves comparable accuracy, but consumes significantly fewer parameters. △ Less

Submitted 16 June, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.03178 [pdf, other]

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Authors: Zhao You, Shulin Feng, Dan Su, Dong Yu

Abstract: Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR. In this paper, based on our prior work, we identify and integrate several approaches to achieve further improvements for ASR tasks, which we denote as multi-loss, multi-path and multi-level, summarized as "3M" model. Specifically, multi-loss refers to the joint CTC/AED loss and multi-path denotes the Mixture-of-E… ▽ More Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR. In this paper, based on our prior work, we identify and integrate several approaches to achieve further improvements for ASR tasks, which we denote as multi-loss, multi-path and multi-level, summarized as "3M" model. Specifically, multi-loss refers to the joint CTC/AED loss and multi-path denotes the Mixture-of-Experts(MoE) architecture which can effectively increase the model capacity without remarkably increasing computation cost. Multi-level means that we introduce auxiliary loss at multiple level of a deep model to help training. We evaluate our proposed method on the public WenetSpeech dataset and experimental results show that the proposed method provides 12.2%-17.6% relative CER improvement over the baseline model trained by Wenet toolkit. On our large scale dataset of 150k hours corpus, the 3M model has also shown obvious superiority over the baseline Conformer model. Code is publicly available at https://github.com/tencent-ailab/3m-asr. △ Less

Submitted 14 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: 5 pages, 1 figure. Submitted to INTERSPEECH 2022

arXiv:2203.03969 [pdf, other]

A Dynamic Hierarchical Framework for IoT-assisted Metaverse Synchronization

Authors: Yue Han, Dusit Niyato, Cyril Leung, Dong In Kim, Kun Zhu, Shaohan Feng, Sherman Xuemin Shen, Chunyan Miao

Abstract: Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accu… ▽ More Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accurately reflect that of its real-world twin because the latter may be subject to changes with time. As such, it is necessary to synchronize a DT with its physical counterpart to ensure that its status is accurate for virtual businesses in the Metaverse. In this paper, we propose a dynamic hierarchical framework in which a group of IoT devices is incentivized to sense and collect physical objects' status information collectively so as to assists virtual service providers (VSPs) in synchronizing DTs. Based on the collected sensing data and the value decay rate of the DTs, the VSPs can determine synchronization intensities to maximize their payoffs. In our proposed dynamic hierarchical framework, the lower-level evolutionary game captures the VSPs selection by the IoT device population, and the upper-level differential game captures the VSPs payoffs, which are affected by the synchronization strategy, IoT devices selections, and the DTs value status, given VSPs are simultaneous decision makers. We further consider the case in which some VSPs are first movers and extend it as a Stackelberg differential game. We theoretically and experimentally show that the equilibrium to the lower-level game exists and is evolutionarily robust, and provide a sensitivity analysis with respect to various system parameters. Experiments show that the proposed dynamic hierarchical game outperform the baseline. △ Less

Submitted 14 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2202.11880 [pdf, other]

On Nash-Stackelberg-Nash Games under Decision-Dependent Uncertainties: Model and Equilibrium

Authors: Yunfan Zhang, Feng Liu, Zhaojian Wang, Yue Chen, Shuanglei Feng, Qiuwei Wu, Yunhe Hou

Abstract: In this paper, we discuss a class of two-stage hierarchical games with multiple leaders and followers, which is called Nash-Stackelberg-Nash (N-S-N) games. Particularly, we consider N-S-N games under decision-dependent uncertainties (DDUs). DDUs refer to the uncertainties that are affected by the strategies of decision-makers and have been rarely addressed in game equilibrium analysis. In this pap… ▽ More In this paper, we discuss a class of two-stage hierarchical games with multiple leaders and followers, which is called Nash-Stackelberg-Nash (N-S-N) games. Particularly, we consider N-S-N games under decision-dependent uncertainties (DDUs). DDUs refer to the uncertainties that are affected by the strategies of decision-makers and have been rarely addressed in game equilibrium analysis. In this paper, we first formulate the N-S-N games with DDUs of complete ignorance, where the interactions between the players and DDUs are characterized by uncertainty sets that depend parametrically on the players' strategies. Then, a rigorous definition for the equilibrium of the game is established by consolidating generalized Nash equilibrium and Pareto-Nash equilibrium. Afterward, we prove the existence of the equilibrium of N-S-N games under DDUs by applying Kakutani's fixed-point theorem. Finally, an illustrative example is provided to show the impact of DDUs on the equilibrium of N-S-N games. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.04542 [pdf, other]

Spectrally Adaptive Common Spatial Patterns

Authors: Mahta Mousavi, Eric Lybrand, Shuangquan Feng, Shuai Tang, Rayan Saab, Virginia de Sa

Abstract: The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to ove… ▽ More The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to overfitting and is known to suffer from poor generalization especially with limited calibration data. Additionally, due to the high heterogeneity in brain data and the non-stationarity of brain activity, CSP is usually trained for each user separately resulting in long calibration sessions or frequent re-calibrations that are tiring for the user. In this work, we propose a novel algorithm called Spectrally Adaptive Common Spatial Patterns (SACSP) that improves CSP by learning a temporal/spectral filter for each spatial filter so that the spatial filters are concentrated on the most relevant temporal frequencies for each user. We show the efficacy of SACSP in providing better generalizability and higher classification accuracy from calibration to online control compared to existing methods. Furthermore, we show that SACSP provides neurophysiologically relevant information about the temporal frequencies of the filtered signals. Our results highlight the differences in the motor imagery signal among BCI users as well as spectral differences in the signals generated for each class, and show the importance of learning robust user-specific features in a data-driven manner. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.11207 [pdf, other]

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

Authors: Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

Abstract: The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. Wh… ▽ More The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we 1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery. △ Less

Submitted 27 January, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Accepted for publication in Computer Speech and Language

arXiv:2201.04908 [pdf, ps, other]

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Authors: Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

Abstract: In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightf… ▽ More In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretched enhancement is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: Extended version of paper to be submitted to Interspeech 2022. 6 pages, 2 tables

arXiv:2111.11831 [pdf, other]

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Authors: Zhao You, Shulin Feng, Dan Su, Dong Yu

Abstract: Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performanc… ▽ More Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. Experimental results show that the proposed SpeechMoE2 can achieve lower character error rate (CER) with comparable parameters than SpeechMoE on both multi-domain and multi-accent task. Primarily, the proposed method provides up to 1.6% - 4.8% relative CER improvement for the multidomain task and 1.9% - 17.7% relative CER improvement for the multi-accent task respectively. Besides, increasing the number of experts also achieves consistent performance improvement and keeps the computational cost constant. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 5 pages, 1 figure. Submitted to ICASSP 2022

arXiv:2111.02926 [pdf, other]

OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion

Authors: Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng **, Yinan Feng, Qili Zeng, Yinpeng Chen, Youzuo Lin

Abstract: Full waveform inversion (FWI) is widely used in geophysics to reconstruct high-resolution velocity maps from seismic data. The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community. We present OpenFWI, a collection of large-scale multi-structural benchmark datasets, to facilitate diversified, rigorous, and reproducible… ▽ More Full waveform inversion (FWI) is widely used in geophysics to reconstruct high-resolution velocity maps from seismic data. The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community. We present OpenFWI, a collection of large-scale multi-structural benchmark datasets, to facilitate diversified, rigorous, and reproducible research on FWI. In particular, OpenFWI consists of 12 datasets (2.1TB in total) synthesized from multiple sources. It encompasses diverse domains in geophysics (interface, fault, CO2 reservoir, etc.), covers different geological subsurface structures (flat, curve, etc.), and contains various amounts of data samples (2K - 67K). It also includes a dataset for 3D FWI. Moreover, we use OpenFWI to perform benchmarking over four deep learning methods, covering both supervised and unsupervised learning regimes. Along with the benchmarks, we implement additional experiments, including physics-driven methods, complexity analysis, generalization study, uncertainty quantification, and so on, to sharpen our understanding of datasets and methods. The studies either provide valuable insights into the datasets and the performance, or uncover their current limitations. We hope OpenFWI supports prospective research on FWI and inspires future open-source efforts on AI for science. All datasets and related information can be accessed through our website at https://openfwi-lanl.github.io/ △ Less

Submitted 23 June, 2023; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: This manuscript has been accepted by NeurIPS 2022 dataset and benchmark track

arXiv:2110.14879 [pdf, ps, other]

Pilot Optimization and Channel Estimation for Two-way Relaying Network Aided by IRS with Finite Discrete Phase Shifters

Authors: Zhongwen Sun, Xuehui Wang, Siling Feng, Xinrong Guan, Feng Shu, Jiangzhou Wang

Abstract: In this paper, we investigate the problem of pilot optimization and channel estimation of two-way relaying network (TWRN) aided by an intelligent reflecting surface (IRS) with finite discrete phase shifters. In a TWRN, there exists a challenging problem that the two cascading channels from source-to-IRS-to-Relay and destination-to-IRS-to-relay interfere with each other. Via designing the initial p… ▽ More In this paper, we investigate the problem of pilot optimization and channel estimation of two-way relaying network (TWRN) aided by an intelligent reflecting surface (IRS) with finite discrete phase shifters. In a TWRN, there exists a challenging problem that the two cascading channels from source-to-IRS-to-Relay and destination-to-IRS-to-relay interfere with each other. Via designing the initial phase shifts of IRS and pilot pattern, the two cascading channels are separated by using simple arithmetic operations like addition and subtraction. Then, the least-squares estimator is adopted to estimate the two cascading channels and two direct channels from source to relay and destination to relay. The corresponding mean square errors (MSE) of channel estimators are derived. By minimizing MSE, the optimal phase shift matrix of IRS is proved. Then, two special matrices Hadamard and discrete Fourier transform (DFT) matrix is shown to be two optimal training matrices for IRS. Furthermore, the IRS with discrete finite phase shifters is taken into account. Using theoretical derivation and numerical simulations, we find that 3-4 bits phase shifters are sufficient for IRS to achieve a negligible MSE performance loss. More importantly, the Hadamard matrix requires only one-bit phase shifters to achieve the optimal MSE performance while the DFT matrix requires at least three or four bits to achieve the same performance. Thus, the Hadamard matrix is a perfect choice for channel estimation using low-resolution phase-shifting IRS. △ Less

Submitted 15 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 figures

arXiv:2109.00154 [pdf, other]

DOA Estimation Using Massive Receive MIMO: Basic Principle and Key Techniques

Authors: Jiangzhou Wang, Baihua Shi, Feng Shu, Qi Zhang, Di Wu, Qijuan Jie, Zhihong Zhuang, Siling Feng, Yi** Zhang

Abstract: As massive multiple-input multiple-output (MIMO) becomes popular, direction of arrival (DOA) measurement has been made a real renaissance due to the high-resolution achieved. Thus, there is no doubt about DOA estimation using massive MIMO. The purpose of this paper is to describe its basic principles and key techniques, to present the performance analysis, and to appreciate its engineering applica… ▽ More As massive multiple-input multiple-output (MIMO) becomes popular, direction of arrival (DOA) measurement has been made a real renaissance due to the high-resolution achieved. Thus, there is no doubt about DOA estimation using massive MIMO. The purpose of this paper is to describe its basic principles and key techniques, to present the performance analysis, and to appreciate its engineering applications. It is anticipated that there are still many challenges in DOA estimation using massive receive MIMO, such as high circuit cost, high energy consumption and high complexity of the algorithm implementation. New researches and breakthroughs are illustrated to deal with those problems. Then, a new architecture, hybrid analog and digital (HAD) massive receive MIMO with low-resolution ADCs, is presented to strike a good balance among circuit cost, complexity and performance. Then, a novel three-dimensional (3D) angle of arrival (AOA) localization method based on geometrical center is proposed to compute the position of a passive emitter using single base station equipped with an ultra-massive MIMO system. And, it can achieve the Cramer-Rao low bound (CRLB). Here, the performance loss is also analyzed to quantify the minimum number of bits. DOA estimation will play a key role in lots of applications, such as directional modulation, beamforming tracking and alignment for 6G. △ Less

Submitted 15 July, 2023; v1 submitted 31 August, 2021; originally announced September 2021.

arXiv:2105.03643 [pdf, ps, other]

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Authors: Liqiang He, Shulin Feng, Dan Su, Dong Yu

Abstract: Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operat… ▽ More Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with the hybrid CLDNN baseline, on a 10k-hour large-scale dataset. △ Less

Submitted 13 September, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

Comments: Accepted to ASRU 2021

arXiv:2105.03036 [pdf, other]

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

Authors: Zhao You, Shulin Feng, Dan Su, Dong Yu

Abstract: Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains. This is largely due to the following advantages of this architecture: firstly, MoE based Transformer can increase model capacity without computational cost increasing both at training and inference time. Besides, MoE based Transformer is a dynamic network which can adapt to the varying complexity of i… ▽ More Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains. This is largely due to the following advantages of this architecture: firstly, MoE based Transformer can increase model capacity without computational cost increasing both at training and inference time. Besides, MoE based Transformer is a dynamic network which can adapt to the varying complexity of input instances in realworld applications. In this work, we explore the MoE based model for speech recognition, named SpeechMoE. To further control the sparsity of router activation and improve the diversity of gate values, we propose a sparsity L1 loss and a mean importance loss respectively. In addition, a new router architecture is used in SpeechMoE which can simultaneously utilize the information from a shared embedding network and the hierarchical representation of different MoE layers. Experimental results show that SpeechMoE can achieve lower character error rate (CER) with comparable computation cost than traditional static networks, providing 7.0%-23.0% relative CER improvements on four evaluation datasets. △ Less

Submitted 6 May, 2021; originally announced May 2021.

Comments: 5 pages, 2 figures. Submitted to Interspeech 2021

arXiv:2104.13721 [pdf, other]

Optimal Cooperative Driving at Signal-Free Intersections with Polynomial-Time Complexity

Authors: Huaxin Pei, Yuxiao Zhang, Yi Zhang, Shuo Feng

Abstract: Cooperative driving at signal-free intersections, which aims to improve driving safety and efficiency for connected and automated vehicles, has attracted increasing interest in recent years. However, existing cooperative driving strategies either suffer from computational complexity or cannot guarantee global optimality. To fill this research gap, this paper proposes an optimal and computationally… ▽ More Cooperative driving at signal-free intersections, which aims to improve driving safety and efficiency for connected and automated vehicles, has attracted increasing interest in recent years. However, existing cooperative driving strategies either suffer from computational complexity or cannot guarantee global optimality. To fill this research gap, this paper proposes an optimal and computationally efficient cooperative driving strategy with the polynomial-time complexity. By modeling the conflict relations among the vehicles, the solution space of the cooperative driving problem is completely represented by a newly designed small-size state space. Then, based on dynamic programming, the globally optimal solution can be searched inside the state space efficiently. It is proved that the proposed strategy can reduce the time complexity of computation from exponential to a small-degree polynomial. Simulation results further demonstrate that the proposed strategy can obtain the globally optimal solution within a limited computation time under various traffic demand settings. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Showing 1–50 of 100 results for author: Feng, S