Search | arXiv e-print repository

Robust Optimal Lane-changing Control for Connected Autonomous Vehicles in Mixed Traffic

Authors: Anni Li, Andres S. Chavez Armijos, Christos G. Cassandras

Abstract: We derive time and energy-optimal policies for a Connected Autonomous Vehicle (CAV) to execute lane change maneuvers in mixed traffic, i.e., in the presence of both CAVs and Human Driven Vehicles (HDVs). These policies are also shown to be robust with respect to the unpredictable behavior of HDVs by exploiting CAV cooperation which can eliminate or greatly reduce the interaction between CAVs and H… ▽ More We derive time and energy-optimal policies for a Connected Autonomous Vehicle (CAV) to execute lane change maneuvers in mixed traffic, i.e., in the presence of both CAVs and Human Driven Vehicles (HDVs). These policies are also shown to be robust with respect to the unpredictable behavior of HDVs by exploiting CAV cooperation which can eliminate or greatly reduce the interaction between CAVs and HDVs. We derive a simple threshold-based criterion on the initial relative distance between two cooperating CAVs based on which an optimal policy is selected such that the lane-changing CAV merges ahead of a cooperating CAV in the target lane; in this case, the lane-changing CAV's trajectory becomes independent of HDV behavior. Otherwise, the interaction between CAVs and neighboring HDVs is formulated as a bilevel optimization problem with an appropriate behavioral model for an HDV, and an iterated best response (IBR) method is used to determine an equilibrium. We demonstrate the convergence of the IBR process under certain conditions. Furthermore, Control Barrier Functions (CBFs) are implemented to ensure the robustness of lane-changing behaviors by guaranteeing safety in both longitudinal and lateral directions despite HDV disturbances. Simulation results validate the effectiveness of our CAV controllers in terms of cost, safety guarantees, and limited disruption to traffic flow. Additionally, we demonstrate the robustness of the lane-changing behaviors in the presence of uncontrollable HDVs. △ Less

Submitted 15 March, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.16948

arXiv:2406.11175 [pdf, other]

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, termed SMRU, to cover different application scenarios. The novelty lies in two-fold. First, a multi-scale band split layer and band merge layer are proposed to effectively fuse local frequency bands for lower complexity modeling. Besides, by simulating the multi-resolution feature modeling characteristic of the classical UNet structure, a novel recurrent-dominated UNet is devised. It consists of multiple variable frame rate blocks, each of which involves the causal time down-/up-sampling layer with varying compression ratios and the dual-path structure for inter- and intra-band modeling. The model is configured from 50 M/s to 6.8 G/s in terms of MACs, and the experimental results show that the proposed approach yields competitive or even better performance over existing baselines, and has the full potential to adapt to more general scenarios with varying complexity requirements. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.00758 [pdf, other]

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Authors: Anqi Li, Yuxi Liu, Huihui Bai, Feng Li, Runmin Cong, Meng Wang, Yao Zhao

Abstract: Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of… ▽ More Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. We base Control-GIC on a VQGAN framework representing an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Therefore, drawing inspiration from the classical coding principle, we naturally correlate the information density of local image patches with their granular representations, to achieve dynamic adjustment of the code quantity following different granularity decisions. This implies we can flexibly determine a proper allocation of granularity for the patches to acquire desirable compression rates. We further develop a probabilistic conditional decoder that can trace back to historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption and even once compression on an entire dataset to fulfill constrained bitrate conditions. Experimental results demonstrate its superior performance over recent state-of-the-art methods. △ Less

Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.17594 [pdf, other]

Towards Achieving Cooperation Compliance of Human Drivers in Mixed Traffic

Authors: Anni Li, Christos G. Cassandras

Abstract: We consider a mixed-traffic environment in transportation systems, where Connected and Automated Vehicles (CAVs) coexist with potentially non-cooperative Human-Driven Vehicles (HDVs). We develop a cooperation compliance control framework to incentivize HDVs to align their behavior with socially optimal objectives using a ``refundable toll'' scheme so as to achieve a desired compliance probability… ▽ More We consider a mixed-traffic environment in transportation systems, where Connected and Automated Vehicles (CAVs) coexist with potentially non-cooperative Human-Driven Vehicles (HDVs). We develop a cooperation compliance control framework to incentivize HDVs to align their behavior with socially optimal objectives using a ``refundable toll'' scheme so as to achieve a desired compliance probability for all non-compliant HDVs through a feedback control mechanism combining global with local (individual) components. We apply this scheme to the lane-changing problem, where a ``Social Planner'' provides references to the HDVs, measures their state errors, and induces cooperation compliance for safe lane-changing through a refundable toll approach. Simulation results are included to show the effectiveness of our cooperation compliance controller in terms of improved compliance and lane-changing maneuver safety and efficiency when non-cooperative HDVs are present. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16446 [pdf, ps, other]

A New Solution for MU-MISO Symbol-Level Precoding: Extrapolation and Deep Unfolding

Authors: Mu Liang, Ang Li, Xiaoyan Hu, Christos Masouros

Abstract: Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordabl… ▽ More Constructive interference (CI) precoding, which converts the harmful multi-user interference into beneficial signals, is a promising and efficient interference management scheme in multi-antenna communication systems. However, CI-based symbol-level precoding (SLP) experiences high computational complexity as the number of symbol slots increases within a transmission block, rendering it unaffordable in practical communication systems. In this paper, we propose a symbol-level extrapolation (SLE) strategy to extrapolate the precoding matrix by leveraging the relationship between different symbol slots within in a transmission block, during which the channel state information (CSI) remains constant, where we design a closed-form iterative algorithm based on SLE for both PSK and QAM modulation. In order to further reduce the computational complexity, a sub-optimal closed-form solution based on SLE is further developed for PSK and QAM, respectively. Moreover, we design an unsupervised SLE-based neural network (SLE-Net) to unfold the proposed iterative algorithm, which helps enhance the interpretability of the neural network. By carefully designing the loss function of the SLE-Net, the time-complexity of the network can be reduced effectively. Extensive simulation results illustrate that the proposed algorithms can dramatically reduce the computational complexity and time complexity with only marginal performance loss, compared with the conventional SLP design methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.04167 [pdf, other]

Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

Authors: Aobo Li, **jian Wu, Yongxu Liu, Leida Li

Abstract: The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may… ▽ More The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may not improve or even be harmful to generalizing authentic image quality assessment. To solve this challenge, we propose distortion-guided unsupervised domain adaptation for BIQA (DGQA), a novel framework that leverages adaptive multi-domain selection via prior knowledge from distortion to match the data distribution between the source domains and the target domain, thereby reducing negative transfer from the outlier source domains. Extensive experiments on two cross-domain settings (synthetic distortion to authentic distortion and synthetic distortion to algorithmic distortion) have demonstrated the effectiveness of our proposed DGQA. Besides, DGQA is orthogonal to existing model-based BIQA methods, and can be used in combination with such models to improve performance with less training data. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR2024

arXiv:2404.15364 [pdf, other]

doi 10.1109/LMWT.2024.3386330

MP-DPD: Low-Complexity Mixed-Precision Neural Networks for Energy-Efficient Digital Predistortion of Wideband Power Amplifiers

Authors: Yizhuo Wu, Ang Li, Mohammadreza Beikmirza, Gagan Deep Singh, Qinyu Chen, Leo C. N. de Vreede, Morteza Alavi, Chang Gao

Abstract: Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP)… ▽ More Digital Pre-Distortion (DPD) enhances signal quality in wideband RF power amplifiers (PAs). As signal bandwidths expand in modern radio systems, DPD's energy consumption increasingly impacts overall system efficiency. Deep Neural Networks (DNNs) offer promising advancements in DPD, yet their high complexity hinders their practical deployment. This paper introduces open-source mixed-precision (MP) neural networks that employ quantized low-precision fixed-point parameters for energy-efficient DPD. This approach reduces computational complexity and memory footprint, thereby lowering power consumption without compromising linearization efficacy. Applied to a 160MHz-BW 1024-QAM OFDM signal from a digital RF PA, MP-DPD gives no performance loss against 32-bit floating-point precision DPDs, while achieving -43.75 (L)/-45.27 (R) dBc in Adjacent Channel Power Ratio (ACPR) and -38.72 dB in Error Vector Magnitude (EVM). A 16-bit fixed-point-precision MP-DPD enables a 2.8X reduction in estimated inference power. The PyTorch learning and testing code is publicly available at \url{https://github.com/lab-emi/OpenDPD}. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

arXiv:2403.09096 [pdf, other]

Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction

Authors: Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhu

Abstract: In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgr… ▽ More In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgrading the yielded HSISR. In contrast to most existing methods based on respective low-light enhancements (LLIE) of MSI and HSI followed by their fusion, a deep Unfolding HSI Super-Resolution with Automatic Exposure Correction (UHSR-AEC) is proposed, that can effectively generate a high-quality fused HSI-SR (in texture and features) even under very imbalanced exposures, thanks to the correlation between LLIE and HSI-SR taken into account. Extensive experiments are provided to demonstrate the state-of-the-art overall performance of the proposed UHSR-AEC, including comparison with some benchmark peer methods. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.15944 [pdf, other]

On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Authors: Gang Li, Qiuwei Li, Shuang Li, Wu Angela Li

Abstract: Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based me… ▽ More Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based measure, two OMP-type algorithms are proposed, which significantly outperform the classical OMP algorithm in terms of recovery accuracy while maintaining comparable computational complexity. An $l_1$-based algorithm, denoted as $\text{Alg}_{GBP}$ (greedy basis pursuit) algorithm, is derived. Such an algorithm significantly outperforms the classical BP algorithm. A CoSaMP-type algorithm is also proposed to further enhance the performance of the two proposed OMP-type algorithms. The superior performance of our proposed algorithms is demonstrated through extensive numerical simulations using synthetic data as well as video signals, highlighting their potential for various applications in compressed sensing and signal processing. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.04882 [pdf, other]

LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory Units

Authors: Zeyu Liu, Gourav Datta, Anni Li, Peter Anthony Beerel

Abstract: Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability making them ill-suited for many streaming applications at the edge where devices are heavily resource-constrained. Thus motivated, many researchers have proposed reformulating the transformer models as RNN modules which modify the self-attention computation… ▽ More Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability making them ill-suited for many streaming applications at the edge where devices are heavily resource-constrained. Thus motivated, many researchers have proposed reformulating the transformer models as RNN modules which modify the self-attention computation with explicit states. However, these approaches often incur significant performance degradation. The ultimate goal is to develop a model that has the following properties: parallel training, streaming and low-cost inference, and SOTA performance. In this paper, we propose a new direction to achieve this goal. We show how architectural modifications to a recurrent model can help push its performance toward Transformer models while retaining its sequential processing capability. Specifically, inspired by the recent success of Legendre Memory Units (LMU) in sequence learning tasks, we propose LMUFormer, which augments the LMU with convolutional patch embedding and convolutional channel mixer. Moreover, we present a spiking version of this architecture, which introduces the benefit of states within the patch embedding and channel mixer modules while simultaneously reducing the computing complexity. We evaluated our architectures on multiple sequence datasets. In comparison to SOTA transformer-based models within the ANN domain on the SCv2 dataset, our LMUFormer demonstrates comparable performance while necessitating a remarkable 53 times reduction in parameters and a substantial 65 times decrement in FLOPs. Additionally, owing to our model's proficiency in real-time data processing, we can achieve a 32.03% reduction in sequence length, all while incurring an inconsequential decline in performance. Our code is publicly available at https://github.com/zeyuliu1037/LMUFormer.git. △ Less

Submitted 19 January, 2024; originally announced February 2024.

Comments: The 12th International Conference on Learning Representations (ICLR 2024)

arXiv:2402.03710 [pdf, other]

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

Authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

Abstract: In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to… ▽ More In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to edit multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for editing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles it into the desired output. We developed a 160-hour dataset with over 100k mixtures, including speech and various audio sources, along with text prompts for diverse editing tasks like extraction, removal, and volume control. Our experiments demonstrate significant improvements in signal quality across all editing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: preprint

arXiv:2401.00166 [pdf, ps, other]

Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality

Authors: Junwen Yang, Ang Li, Xuewen Liao, Christos Masouros, A. L. Swindlehurst

Abstract: This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per bl… ▽ More This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per block, and while not necessarily constant over all the symbol slots, we refer to the technique as block-level slot-variant IE precoding. Through a careful examination of the optimal structure and the explicit duality inherent in block-level power minimization (PM) and signal-to-interference-plus-noise ratio (SINR) balancing (SB) problems, we discover that the joint optimization can be decomposed into subproblems with smaller variable sizes. As a step further, we propose block-level slot-invariant IE precoding by adding a structural constraint on the slot-variant IE precoding to maintain a constant precoder throughout the block. A novel linear precoder for IE is further presented, and we prove that the proposed slot-variant and slot-invariant IE precoding share an identical solution when the number of symbol slots does not exceed the number of users. Numerical simulations demonstrate that the proposed precoders achieve a significant complexity reduction compared against benchmark schemes, without sacrificing performance. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Submitted to IEEE

arXiv:2312.08507 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446528

Patient-Adaptive and Learned MRI Data Undersampling Using Neighborhood Clustering

Authors: Siddhant Gautam, Angqi Li, Saiprasad Ravishankar

Abstract: There has been much recent interest in adapting undersampled trajectories in MRI based on training data. In this work, we propose a novel patient-adaptive MRI sampling algorithm based on grou** scans within a training set. Scan-adaptive sampling patterns are optimized together with an image reconstruction network for the training scans. The training optimization alternates between determining th… ▽ More There has been much recent interest in adapting undersampled trajectories in MRI based on training data. In this work, we propose a novel patient-adaptive MRI sampling algorithm based on grou** scans within a training set. Scan-adaptive sampling patterns are optimized together with an image reconstruction network for the training scans. The training optimization alternates between determining the best sampling pattern for each scan (based on a greedy search or iterative coordinate descent (ICD)) and training a reconstructor across the dataset. The eventual scan-adaptive sampling patterns on the training set are used as labels to predict sampling design using nearest neighbor search at test time. The proposed algorithm is applied to the fastMRI knee multicoil dataset and demonstrates improved performance over several baselines. △ Less

Submitted 31 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.16456 [pdf, other]

Spiking Neural Networks with Dynamic Time Steps for Vision Transformers

Authors: Gourav Datta, Zeyu Liu, Anni Li, Peter A. Beerel

Abstract: Spiking Neural Networks (SNNs) have emerged as a popular spatio-temporal computing paradigm for complex vision tasks. Recently proposed SNN training algorithms have significantly reduced the number of time steps (down to 1) for improved latency and energy efficiency, however, they target only convolutional neural networks (CNN). These algorithms, when applied on the recently spotlighted vision tra… ▽ More Spiking Neural Networks (SNNs) have emerged as a popular spatio-temporal computing paradigm for complex vision tasks. Recently proposed SNN training algorithms have significantly reduced the number of time steps (down to 1) for improved latency and energy efficiency, however, they target only convolutional neural networks (CNN). These algorithms, when applied on the recently spotlighted vision transformers (ViT), either require a large number of time steps or fail to converge. Based on analysis of the histograms of the ANN and SNN activation maps, we hypothesize that each ViT block has a different sensitivity to the number of time steps. We propose a novel training framework that dynamically allocates the number of time steps to each ViT module depending on a trainable score assigned to each timestep. In particular, we generate a scalar binary time step mask that filters spikes emitted by each neuron in a leaky-integrate-and-fire (LIF) layer. The resulting SNNs have high activation sparsity and require only accumulate operations (AC), except for the input embedding layer, in contrast to expensive multiply-and-accumulates (MAC) needed in traditional ViTs. This yields significant improvements in energy efficiency. We evaluate our training framework and resulting SNNs on image recognition tasks including CIFAR10, CIFAR100, and ImageNet with different ViT architectures. We obtain a test accuracy of 95.97% with 4.97 time steps with direct encoding on CIFAR10. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Under review

arXiv:2310.05021 [pdf, other]

Toward Intelligent Emergency Control for Large-scale Power Systems: Convergence of Learning, Physics, Computing and Control

Authors: Qiuhua Huang, Renke Huang, Tianzhixi Yin, Sohom Datta, Xueqing Sun, Jason Hou, Jie Tan, Wenhao Yu, Yuan Liu, Xinya Li, Bruce Palmer, Ang Li, Xinda Ke, Marianna Vaiman, Song Wang, Yousu Chen

Abstract: This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, t… ▽ More This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, there are multifaceted challenges such as scalability, adaptiveness, and security posed by the complex power system landscape, which demand comprehensive solutions. The paper first proposes and instantiates a convergence framework for integrating power systems physics, machine learning, advanced computing, and grid control to realize intelligent grid control at a large scale. Our developed methods and platform based on the convergence framework have been applied to a large (more than 3000 buses) Texas power system, and tested with 56000 scenarios. Our work achieved a 26% reduction in load shedding on average and outperformed existing rule-based control in 99.7% of the test scenarios. The results demonstrated the potential of the proposed convergence framework and DRL-based intelligent control for the future grid. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: submitted to PSCC 2024

arXiv:2310.00534 [pdf, other]

Safe Optimal Interactions Between Automated and Human-Driven Vehicles in Mixed Traffic with Event-triggered Control Barrier Functions

Authors: Anni Li, Christos G. Cassandras, Wei Xiao

Abstract: This paper studies safe driving interactions between Human-Driven Vehicles (HDVs) and Connected and Automated Vehicles (CAVs) in mixed traffic where the dynamics and control policies of HDVs are unknown and hard to predict. In order to address this challenge, we employ event-triggered Control Barrier Functions (CBFs) to estimate the HDV model online, construct data-driven and state-feedback safety… ▽ More This paper studies safe driving interactions between Human-Driven Vehicles (HDVs) and Connected and Automated Vehicles (CAVs) in mixed traffic where the dynamics and control policies of HDVs are unknown and hard to predict. In order to address this challenge, we employ event-triggered Control Barrier Functions (CBFs) to estimate the HDV model online, construct data-driven and state-feedback safety controllers, and transform constrained optimal control problems for CAVs into a sequence of event-triggered quadratic programs. We show that we can ensure collision-free between HDVs and CAVs and demonstrate the robustness and flexibility of our framework on different types of human drivers in lane-changing scenarios while guaranteeing safety with human-in-the-loop interactions. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.15938 [pdf, other]

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

Authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

Abstract: In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augment… ▽ More In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.14324 [pdf, other]

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Authors: Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-yiin Chang, Hung-yi Lee

Abstract: This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language mo… ▽ More This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech. It utilizes text instructions as style prompts to modify the prosody and emotional information of the given speech. In contrast to previous approaches, which often rely on employing separate encoders like prosody and content encoders to handle different aspects of the source speech, our model handles various information of speech in an end-to-end manner. Experiments have demonstrated the impressive capabilities of our model in comprehending instructions and delivering reasonable results. △ Less

Submitted 16 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted to ASRU 2023

arXiv:2309.10753 [pdf, ps, other]

Generalized Cactus and Structural Controllability of Switched Linear Continuous-Time Systems

Authors: Yuan Zhang, Yuanqing Xia, Aming Li

Abstract: This paper explores the structural controllability of switched linear continuous-time systems. It first identifies a gap in the proof for a pivotal criterion for the structural controllability of switched linear systems in the literature. To address this void, we develop novel graph-theoretic concepts, such as multi-layer dynamic graphs, generalized stems/buds, and generalized cacti, and based on… ▽ More This paper explores the structural controllability of switched linear continuous-time systems. It first identifies a gap in the proof for a pivotal criterion for the structural controllability of switched linear systems in the literature. To address this void, we develop novel graph-theoretic concepts, such as multi-layer dynamic graphs, generalized stems/buds, and generalized cacti, and based on them, provide a comprehensive proof for this criterion. Our approach also induces a new, generalized cactus based graph-theoretic criterion for structural controllability. This not only extends Lin's cactus-based graph-theoretic condition to switched systems for the first time, but also provides a lower bound for the generic dimension of controllable subspaces. △ Less

Submitted 22 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Under view in IEEE TAC; fixed some typos

arXiv:2309.09493 [pdf, other]

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Abstract: Recent advancements in speech synthesis have leveraged GAN-based networks like HiFi-GAN and BigVGAN to produce high-fidelity waveforms from mel-spectrograms. However, these networks are computationally expensive and parameter-heavy. iSTFTNet addresses these limitations by integrating inverse short-time Fourier transform (iSTFT) into the network, achieving both speed and parameter efficiency. In th… ▽ More Recent advancements in speech synthesis have leveraged GAN-based networks like HiFi-GAN and BigVGAN to produce high-fidelity waveforms from mel-spectrograms. However, these networks are computationally expensive and parameter-heavy. iSTFTNet addresses these limitations by integrating inverse short-time Fourier transform (iSTFT) into the network, achieving both speed and parameter efficiency. In this paper, we introduce an extension to iSTFTNet, termed HiFTNet, which incorporates a harmonic-plus-noise source filter in the time-frequency domain that uses a sinusoidal source from the fundamental frequency (F0) inferred via a pre-trained F0 estimation network for fast inference speed. Subjective evaluations on LJSpeech show that our model significantly outperforms both iSTFTNet and HiFi-GAN, achieving ground-truth-level performance. HiFTNet also outperforms BigVGAN-base on LibriTTS for unseen speakers and achieves comparable performance to BigVGAN while being four times faster with only $1/6$ of the parameters. Our work sets a new benchmark for efficient, high-quality neural vocoding, paving the way for real-time applications that demand high quality speech synthesis. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.12749 [pdf, other]

Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Authors: Yiran Wang, Yunsi Wen, Ang Li, Xiaoyan Hu, Christos Masouros

Abstract: We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optima… ▽ More We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optimal CI-BLP precoding matrix in a closed form. Similar to the case when the number of users is small, we show that a quadratic programming (QP) optimization on simplex can be constructed. We also design a low-complexity algorithm based on the alternating direction method of multipliers (ADMM) framework, which can efficiently solve large-scale QP problems. We further analyze the convergence and complexity of the proposed algorithm. Numerical results validate our analysis and the optimality of the proposed algorithm, and further show that the proposed algorithm offers a flexible performance-complexity tradeoff by limiting the maximum number of iterations, which motivates the use of CI-BLP in practical wireless systems. △ Less

Submitted 30 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.11636 [pdf, other]

Aggregating Intrinsic Information to Enhance BCI Performance through Federated Learning

Authors: Rui Liu, Yuanyuan Chen, Anran Li, Yi Ding, Han Yu, Cuntai Guan

Abstract: Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model. Though numerous research groups and institutes collect a multitude of EEG datasets for the same BCI task, sharing EEG data from multiple sites is still challenging due to the heterogeneity of devices. The significance of this challenge cannot be overstated, given the c… ▽ More Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model. Though numerous research groups and institutes collect a multitude of EEG datasets for the same BCI task, sharing EEG data from multiple sites is still challenging due to the heterogeneity of devices. The significance of this challenge cannot be overstated, given the critical role of data diversity in fostering model robustness. However, existing works rarely discuss this issue, predominantly centering their attention on model training within a single dataset, often in the context of inter-subject or inter-session settings. In this work, we propose a hierarchical personalized Federated Learning EEG decoding (FLEEG) framework to surmount this challenge. This innovative framework heralds a new learning paradigm for BCI, enabling datasets with disparate data formats to collaborate in the model training process. Each client is assigned a specific dataset and trains a hierarchical personalized model to manage diverse data formats and facilitate information exchange. Meanwhile, the server coordinates the training procedure to harness knowledge gleaned from all datasets, thus elevating overall performance. The framework has been evaluated in Motor Imagery (MI) classification with nine EEG datasets collected by different devices but implementing the same MI task. Results demonstrate that the proposed frame can boost classification performance up to 16.7% by enabling knowledge sharing between multiple datasets, especially for smaller datasets. Visualization results also indicate that the proposed framework can empower the local models to put a stable focus on task-related areas, yielding better performance. To the best of our knowledge, this is the first end-to-end solution to address this important challenge. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.14797 [pdf, other]

Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Authors: Xiao Tong, Ang Li, Lei Lei, Fan Liu, Fuwang Dong

Abstract: Consider a multiuser multiple-input multiple-output (MU-MIMO) downlink system in which the base station (BS) sends multiple data streams to multi-antenna users via symbol-level precoding (SLP), where the optimization of receive combining matrix becomes crucial, unlike in the single-antenna user scenario. We begin by introducing a joint optimization problem on the symbol-level transmit precoder and… ▽ More Consider a multiuser multiple-input multiple-output (MU-MIMO) downlink system in which the base station (BS) sends multiple data streams to multi-antenna users via symbol-level precoding (SLP), where the optimization of receive combining matrix becomes crucial, unlike in the single-antenna user scenario. We begin by introducing a joint optimization problem on the symbol-level transmit precoder and receive combiner. The problem is solved using the alternating optimization (AO) method, and the optimal solution structures for transmit precoding and receive combining matrices are derived by using Lagrangian and Karush-Kuhn-Tucker (KKT) conditions, based on which, the original problem is transformed into an equivalent quadratic programming problem, enabling more efficient solutions. To address the challenge that the above joint design is difficult to implement, we propose a more practical scheme where the receive combining optimization is replaced by the interference rejection combiner (IRC), which is however difficult to directly use because of the rank-one transmit precoding matrix. Therefore, we introduce a new regularized IRC (RIRC) receiver to circumvent the above issue. Numerical results demonstrate that the practical SLP-RIRC method enjoys only a slight communication performance loss compared to the joint transmit precoding and receive combining design, both offering substantial performance gains over the conventional BD-based approaches. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 13 pages, 10 figures

arXiv:2307.09435 [pdf, other]

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

Authors: Yinghao Aaron Li, Cong Han, Nima Mesgarani

Abstract: In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement. These applications typically involve map** text or speech inputs to pre-trained SLM representations, from which target speech is decoded. This paper introduc… ▽ More In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement. These applications typically involve map** text or speech inputs to pre-trained SLM representations, from which target speech is decoded. This paper introduces a new approach, SLMGAN, to leverage SLM representations for discriminative tasks within the generative adversarial network (GAN) framework, specifically for voice conversion. Building upon StarGANv2-VC, we add our novel SLM-based WavLM discriminators on top of the mel-based discriminators along with our newly designed SLM feature matching loss function, resulting in an unsupervised zero-shot voice conversion system that does not require text labels during training. Subjective evaluation results show that SLMGAN outperforms existing state-of-the-art zero-shot voice conversion models in terms of naturalness and achieves comparable similarity, highlighting the potential of SLM-based discriminators for related applications. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: WASPAA 2023

arXiv:2307.00535 [pdf, other]

Goal-oriented Tensor: Beyond Age of Information Towards Semantics-Empowered Goal-Oriented Communications

Authors: Aimin Li, Shaohua Wu, Sumei Sun, Jie Cao

Abstract: Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaborative… ▽ More Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaboratively and distributively to achieve a shared goal of communications. We formulate a two-agent infinite-horizon Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to conjointly deduce the optimal deterministic sampling policy and decision-making policy. To circumvent the curse of dimensionality in obtaining an optimal deterministic joint policy through Brute-Force-Search, a sub-optimal yet computationally efficient algorithm is developed. This algorithm is predicated on the search for a Nash Equilibrium between the sampler and the decision-maker. Simulation results reveal that the proposed sampler & decision-maker co-design surpasses the current literature on AoI and its variants in terms of both goal achievement utility and sparse sampling rate, signifying progress in the semantics-conscious, goal-driven sparse sampling design. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 30 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2305.04083

arXiv:2306.15561 [pdf, other]

You Can Mask More For Extremely Low-Bitrate Image Compression

Authors: Anqi Li, Feng Li, Jiaxin Han, Huihui Bai, Runmin Cong, Chunjie Zhang, Meng Wang, Weisi Lin, Yao Zhao

Abstract: Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture compon… ▽ More Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture components crucial for image compression, treating them equally alongside uninformative components in networks. This can cause severe perceptual quality degradation, especially under low-bitrate scenarios. In this work, inspired by the success of pre-trained masked autoencoders (MAE) in many downstream tasks, we propose to rethink its mask sampling strategy from structure and texture perspectives for high redundancy reduction and discriminative feature representation, further unleashing the potential of LIC methods. Therefore, we present a dual-adaptive masking approach (DA-Mask) that samples visible patches based on the structure and texture distributions of original images. We combine DA-Mask and pre-trained MAE in masked image modeling (MIM) as an initial compressor that abstracts informative semantic context and texture representations. Such a pipeline can well cooperate with LIC networks to achieve further secondary compression while preserving promising reconstruction quality. Consequently, we propose a simple yet effective masked compression model (MCM), the first framework that unifies MIM and LIC end-to-end for extremely low-bitrate image compression. Extensive experiments have demonstrated that our approach outperforms recent state-of-the-art methods in R-D performance, visual quality, and downstream applications, at very low bitrates. Our code is available at https://github.com/lianqi1008/MCM.git. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Under review

arXiv:2306.14509 [pdf, ps, other]

Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Authors: Zihan Liao, Fan Liu, Ang Li, Christos Masouros

Abstract: In this paper, we present an innovative symbol-level precoding (SLP) approach for a wideband multi-user multi-input multi-output (MU-MIMO) downlink Integrated Sensing and Communications (ISAC) system employing faster-than-Nyquist (FTN) signaling. Our proposed technique minimizes the minimum mean squared error (MMSE) for the sensed parameter estimation while ensuring the communication per-user qual… ▽ More In this paper, we present an innovative symbol-level precoding (SLP) approach for a wideband multi-user multi-input multi-output (MU-MIMO) downlink Integrated Sensing and Communications (ISAC) system employing faster-than-Nyquist (FTN) signaling. Our proposed technique minimizes the minimum mean squared error (MMSE) for the sensed parameter estimation while ensuring the communication per-user quality-of-service through the utilization of constructive interference (CI) methodologies. While the formulated problem is non-convex in general, we tackle this issue using proficient minorization and successive convex approximation (SCA) strategies. Numerical results substantiate that our FTN-ISAC-SLP framework significantly enhances communication throughput while preserving satisfactory sensing performance. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.08454 [pdf, other]

Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction

Authors: Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu

Abstract: This paper describes a real-time General Speech Reconstruction (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. This novel proposed system is a two-stage architecture, in which the speech restoration is performed, and then cascaded by speech enhancement. We propose a complex spectral map**-based generative adversarial network (CSM-GAN) as the speech restora… ▽ More This paper describes a real-time General Speech Reconstruction (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. This novel proposed system is a two-stage architecture, in which the speech restoration is performed, and then cascaded by speech enhancement. We propose a complex spectral map**-based generative adversarial network (CSM-GAN) as the speech restoration module for the first time. For noise suppression and dereverberation, the enhancement module is performed with fullband-wideband parallel processing. On the blind test set of ICASSP 2023 SSI Challenge, the proposed Gesper system, which satisfies the real-time condition, achieves 3.27 P.804 overall mean opinion score (MOS) and 3.35 P.835 overall MOS, ranked 1st in both track 1 and track 2. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted by InterSpeech 2023

arXiv:2306.07691 [pdf, other]

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Authors: Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

Abstract: In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random variable through diffusion models to generate the most suitable style for the text without requiring reference speech, a… ▽ More In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random variable through diffusion models to generate the most suitable style for the text without requiring reference speech, achieving efficient latent diffusion while benefiting from the diverse speech synthesis offered by diffusion models. Furthermore, we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training, resulting in improved speech naturalness. StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches it on the multispeaker VCTK dataset as judged by native English speakers. Moreover, when trained on the LibriTTS dataset, our model outperforms previous publicly available models for zero-shot speaker adaptation. This work achieves the first human-level TTS on both single and multispeaker datasets, showcasing the potential of style diffusion and adversarial training with large SLMs. The audio demos and source code are available at https://styletts2.github.io/. △ Less

Submitted 19 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2306.02251 [pdf]

Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Authors: Yiying Hu, Hui Feng, Qinghua Zhao, Aijun Li

Abstract: Few studies have worked on the effects of tonal coarticulation and prosodic positions on the low rising tone in Xiamen Dialect. This study addressed such an issue. To do so, a new method, the Tonal Contour Analysis in Tonal Triangle, was proposed to measure the subtle curvature of the tonal contour. Findings are as follows: (1) The low rising tone in Xiamen Dialect has a tendency towards the falli… ▽ More Few studies have worked on the effects of tonal coarticulation and prosodic positions on the low rising tone in Xiamen Dialect. This study addressed such an issue. To do so, a new method, the Tonal Contour Analysis in Tonal Triangle, was proposed to measure the subtle curvature of the tonal contour. Findings are as follows: (1) The low rising tone in Xiamen Dialect has a tendency towards the falling-rising tone, which is significantly affected by the tonal coarticulation and prosodic positions. (2) The low rising tone presents as a falling-rising tone when preceded by a tone with a high offset, and as a low rising tone when preceded by a tone that ends up low. (3) The curvature of the low rising tone is greatest in the sentence-initial position, and is positively correlated to its own duration. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: To be published in InterSpeech 2023

arXiv:2305.18441 [pdf, other]

doi 10.21437/Interspeech.2023-2297

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Authors: Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

Abstract: Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio repre… ▽ More Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio representation learning called DeCoR. Unlike other methods that store previous data, features, or models, DeCoR indirectly distills knowledge from an earlier model to the latest by predicting quantization indices from a delayed codebook. We demonstrate that DeCoR improves acoustic scene classification accuracy and integrates well with continual self-supervised representation learning. Our approach introduces minimal storage and computation overhead, making it a lightweight and efficient solution for continual learning. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: INTERSPEECH 2023

Journal ref: Proc. INTERSPEECH 2023, pp.2818--2822

arXiv:2305.17883 [pdf, other]

Maximizing Safety and Efficiency for Cooperative Lane-Changing: A Minimally Disruptive Approach

Authors: Andres S. Chavez Armijos, Anni Li, Christos G. Cassandras

Abstract: This paper addresses cooperative lane-changing maneuvers in mixed traffic, aiming to minimize traffic flow disruptions while accounting for uncooperative vehicles. The proposed approach adopts controllers combining Optimal control with Control Barrier Functions (OCBF controllers) which guarantee spatio-temporal constraints through the use of fixed-time convergence. Additionally, we introduce robus… ▽ More This paper addresses cooperative lane-changing maneuvers in mixed traffic, aiming to minimize traffic flow disruptions while accounting for uncooperative vehicles. The proposed approach adopts controllers combining Optimal control with Control Barrier Functions (OCBF controllers) which guarantee spatio-temporal constraints through the use of fixed-time convergence. Additionally, we introduce robustness to disturbances by deriving a method for handling worst-case disturbances using the dual of a linear programming problem. We present a near-optimal solution that ensures safety, optimality, and robustness to changing behavior of uncooperative vehicles. Simulations demonstrate the effectiveness of the proposed approach in enhancing efficiency and safety. △ Less

Submitted 30 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.10953 [pdf, other]

Detecting the driver nodes of temporal networks

Authors: Tingting Qin, Gaopeng Duan, Aming Li

Abstract: Detecting the driver nodes of complex networks has garnered significant attention recently to control complex systems to desired behaviors, where nodes represent system components and edges encode their interactions. Driver nodes, which are directly controlled by external inputs, play a crucial role in controlling all network nodes. While many approaches have been proposed to identify driver nodes… ▽ More Detecting the driver nodes of complex networks has garnered significant attention recently to control complex systems to desired behaviors, where nodes represent system components and edges encode their interactions. Driver nodes, which are directly controlled by external inputs, play a crucial role in controlling all network nodes. While many approaches have been proposed to identify driver nodes of static networks, we still lack an effective algorithm to control ubiquitous temporal networks, where network structures evolve over time. Here we propose an effective online time-accelerated heuristic algorithm (OTaHa) to detect driver nodes of temporal networks. Together with theoretical analysis and numerical simulations on synthetic and empirical temporal networks, we show that OTaHa offers multiple sets of driver nodes, and noticeably outperforms existing methods in terms of accuracy and execution time. We further report that most edges are redundant in controlling temporal networks although the complete instantaneous signal-carrying edges cannot be guaranteed. Moreover, removing edges with high edge betweenness (the number of all-pairs shortest paths passing through the edge) significantly impedes the overall controllability. Our work provides an effective algorithm and paves the way for subsequent explorations on achieving the ultimate control of temporal networks. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.09328 [pdf, ps, other]

Performance Analysis of NOMA-RIS aided Integrated Navigation and Communication (INAC) Networks

Authors: Tianwei Hou, Anna Li

Abstract: Satellite communication constitutes a promising solution for the sixth generation (6G) wireless networks in terms of providing global communication services. In order to provide a cost-effective satellite network, we propose a novel medium-earth-orbit (MEO) satellite aided integrated-navigation-and-communication (INAC) network. To overcome the severe path loss of MEO satellites, we conceive a netw… ▽ More Satellite communication constitutes a promising solution for the sixth generation (6G) wireless networks in terms of providing global communication services. In order to provide a cost-effective satellite network, we propose a novel medium-earth-orbit (MEO) satellite aided integrated-navigation-and-communication (INAC) network. To overcome the severe path loss of MEO satellites, we conceive a network for simultaneous serving navigation and communication for ground users by adopting the non-orthogonal multiple access (NOMA) technique and the reconfigurable intelligent surface technique. Based on the power allocation strategies, communication-oriented (CO-) and navigation-oriented (NO-) INAC scenarios are proposed. We first derive the closed-form expressions for the new channel statistics, outage probability and channel capacity of the INAC-user. For gleaning further insights, the diversity orders and navigation accuracy are evaluated for illustrating the performance of the INAC networks. According to our analysis, when RIS elements are sufficient, the proposed INAC network can perform better than conventional terrestrial communication networks in terms of channel capacity. Numerical results are provided for confirming that the NO-INAC and CO-INAC scenarios have superior performance for communication in the low signal-to-noise-ratio (SNR) regimes and high SNR regimes, respectively, which indicates a hybrid CO/NO-INAC network is preferable. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.04083 [pdf, other]

Goal-oriented Tensor: Beyond AoI Towards Semantics-Empowered Goal-oriented Communications

Authors: Aimin Li, Shaohua Wu, Sumei Sun

Abstract: The intricate interplay of source dynamics, unreliable channels, and staleness of information has long been recognized as a significant impediment for the receiver to achieve accurate, timely, and most importantly, goal-oriented decision making. Thus, a plethora of promising metrics, such as Age of Information, Value of Information, and Mean Square Error, have emerged to quantify these underlying… ▽ More The intricate interplay of source dynamics, unreliable channels, and staleness of information has long been recognized as a significant impediment for the receiver to achieve accurate, timely, and most importantly, goal-oriented decision making. Thus, a plethora of promising metrics, such as Age of Information, Value of Information, and Mean Square Error, have emerged to quantify these underlying adverse factors. Following this avenue, optimizing these metrics has indirectly improved the utility of goal-oriented decision making. Nevertheless, no metric has hitherto been expressly devised to evaluate the utility of a goal-oriented decision-making process. To this end, this paper investigates a novel performance metric, the Goal-oriented Tensor (GoT), to directly quantify the impact of semantic mismatches on the goal-oriented decision making. Based on the GoT, we consider a sampler-decision maker pair that work collaboratively and distributively to achieve a shared goal of communications. We formulate an infinite-horizon Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to conjointly deduce the optimal deterministic sampling policy and decision-making policy. The simulation results reveal that the sampler-decision maker co-design surpasses the current literature on AoI and its variants in terms of both goal achievement utility and sparse sampling rate, signifying a notable accomplishment for a sparse sampler and goal-oriented decision maker co-design. △ Less

Submitted 9 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: 6 pages, 4, figures, Submitted to 2023 Globecom

arXiv:2304.07813 [pdf, other]

Deep Reinforcement Learning-Assisted Age-optimal Transmission Policy for HARQ-aided NOMA Networks

Authors: Kunpeng Liu, Aimin Li, Shaohua Wu

Abstract: The recent interweaving of AI-6G technologies has sparked extensive research interest in further enhancing reliable and timely communications. \emph{Age of Information} (AoI), as a novel and integrated metric implying the intricate trade-offs among reliability, latency, and update frequency, has been well-researched since its conception. This paper contributes new results in this area by employing… ▽ More The recent interweaving of AI-6G technologies has sparked extensive research interest in further enhancing reliable and timely communications. \emph{Age of Information} (AoI), as a novel and integrated metric implying the intricate trade-offs among reliability, latency, and update frequency, has been well-researched since its conception. This paper contributes new results in this area by employing a Deep Reinforcement Learning (DRL) approach to intelligently decide how to allocate power resources and when to retransmit in a \emph{freshness-sensitive} downlink multi-user Hybrid Automatic Repeat reQuest with Chase Combining (HARQ-CC) aided Non-Orthogonal Multiple Access (NOMA) network. Specifically, an AoI minimization problem is formulated as a Markov Decision Process (MDP) problem. Then, to achieve deterministic, age-optimal, and intelligent power allocations and retransmission decisions, the Double-Dueling-Deep Q Network (DQN) is adopted. Furthermore, a more flexible retransmission scheme, referred to as Retransmit-At-Will scheme, is proposed to further facilitate the timeliness of the HARQ-aided NOMA network. Simulation results verify the superiority of the proposed intelligent scheme and demonstrate the threshold structure of the retransmission policy. Also, answers to whether user pairing is necessary are discussed by extensive simulation results. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2304.04142 [pdf]

Slideflow: Deep Learning for Digital Histopathology with Real-Time Whole-Slide Visualization

Authors: James M. Dolezal, Sara Kochanny, Emma Dyer, Andrew Srisuwananukorn, Matteo Sacco, Frederick M. Howard, Anran Li, Prajval Mohan, Alexander T. Pearson

Abstract: Deep learning methods have emerged as powerful tools for analyzing histopathological images, but current methods are often specialized for specific domains and software environments, and few open-source options exist for deploying models in an interactive interface. Experimenting with different deep learning approaches typically requires switching software libraries and reprocessing data, reducing… ▽ More Deep learning methods have emerged as powerful tools for analyzing histopathological images, but current methods are often specialized for specific domains and software environments, and few open-source options exist for deploying models in an interactive interface. Experimenting with different deep learning approaches typically requires switching software libraries and reprocessing data, reducing the feasibility and practicality of experimenting with new architectures. We developed a flexible deep learning library for histopathology called Slideflow, a package which supports a broad array of deep learning methods for digital pathology and includes a fast whole-slide interface for deploying trained models. Slideflow includes unique tools for whole-slide image data processing, efficient stain normalization and augmentation, weakly-supervised whole-slide classification, uncertainty quantification, feature generation, feature space analysis, and explainability. Whole-slide image processing is highly optimized, enabling whole-slide tile extraction at 40X magnification in 2.5 seconds per slide. The framework-agnostic data processing pipeline enables rapid experimentation with new methods built with either Tensorflow or PyTorch, and the graphical user interface supports real-time visualization of slides, predictions, heatmaps, and feature space characteristics on a variety of hardware devices, including ARM-based devices such as the Raspberry Pi. △ Less

Submitted 8 April, 2023; originally announced April 2023.

arXiv:2303.16948 [pdf, other]

Cooperative Lane Changing in Mixed Traffic can be Robust to Human Driver Behavior

Authors: Anni Li, Andres S. Chavez Armijos, Christos G. Cassandras

Abstract: We derive time and energy-optimal control policies for a Connected Autonomous Vehicle (CAV) to complete lane change maneuvers in mixed traffic. The interaction between CAVs and Human-Driven Vehicles (HDVs) requires designing the best possible response of a CAV to actions by its neighboring HDVs. This interaction is formulated using a bilevel optimization setting with an appropriate behavioral mode… ▽ More We derive time and energy-optimal control policies for a Connected Autonomous Vehicle (CAV) to complete lane change maneuvers in mixed traffic. The interaction between CAVs and Human-Driven Vehicles (HDVs) requires designing the best possible response of a CAV to actions by its neighboring HDVs. This interaction is formulated using a bilevel optimization setting with an appropriate behavioral model for an HDV's. Then, an iterated best response (IBR) method is used to determine a Nash equilibrium. However, we also show that when a common and simple-to-detect condition applies, the optimal lane-changing policy is in fact independent of HDV behavior with a CAV changing lanes by cooperating with another CAV in the target lane and always merging ahead of it. Thus, the dependence on the interaction between CAVs and HDVs may be eliminated in such cases. Simulation results are included to show the effectiveness of our controllers in terms of cost, safety guarantees, and disruption to the traffic flow when uncontrollable HDVs are present. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.05991 [pdf, other]

doi 10.1109/LCSYS.2023.3279008

Minimally Disruptive Cooperative Lane-change Maneuvers

Authors: Behdad Chalaki, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Jovin D'sa, Ehsan Moradi-Pari, Andres S. Chavez Armijos, Anni Li, Christos G. Cassandras

Abstract: A lane-change maneuver on a congested highway could be severely disruptive or even infeasible without the cooperation of neighboring cars. However, cooperation with other vehicles does not guarantee that the performed maneuver will not have a negative impact on traffic flow unless it is explicitly considered in the cooperative controller design. In this letter, we present a socially compliant fram… ▽ More A lane-change maneuver on a congested highway could be severely disruptive or even infeasible without the cooperation of neighboring cars. However, cooperation with other vehicles does not guarantee that the performed maneuver will not have a negative impact on traffic flow unless it is explicitly considered in the cooperative controller design. In this letter, we present a socially compliant framework for cooperative lane-change maneuvers for an arbitrary number of CAVs on highways that aims to interrupt traffic flow as minimally as possible. Moreover, we explicitly impose feasibility constraints in the optimization formulation by using reachability set theory, leading to a unified design that removes the need for an iterative procedure used in prior work. We quantitatively evaluate the effectiveness of our framework and compare it against previously offered approaches in terms of maneuver time and incurred throughput disruption. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 6 pages, 2 figures

Journal ref: IEEE Control Systems Letters, vol. 7, pp. 1766-1771, 2023

arXiv:2303.04432 [pdf, ps, other]

Deep Learning-Based Channel Extrapolation for Pattern Reconfigurable Massive MIMO

Authors: Mu Liang, Ang Li

Abstract: Reconfigurable antennas that can dynamically change their operation state exhibit excellent adaptivity and flexibility over traditional antennas, and MIMO arrays that consist of multifunctional and reconfigurable antennas (MRAs) are foreseen as one promising solution towards future Holographic MIMO. Specifically, in pattern reconfigurable MIMO (PR-MIMO) communication systems, accurate acquisition… ▽ More Reconfigurable antennas that can dynamically change their operation state exhibit excellent adaptivity and flexibility over traditional antennas, and MIMO arrays that consist of multifunctional and reconfigurable antennas (MRAs) are foreseen as one promising solution towards future Holographic MIMO. Specifically, in pattern reconfigurable MIMO (PR-MIMO) communication systems, accurate acquisition of channel state information (CSI) of all the radiation modes is a challenging task, because using conventional pilot-based channel estimation techniques in PR-MIMO systems incurs overwhelming pilot overheads. In this letter, we leverage deep learning methods to design a PR neural network, which can use the estimated CSI for one radiation mode to infer CSIs for the other radiation modes. In order to reduce the pilot overheads, we propose a new channel estimation method specially for PR-MIMO systems, which divides the transmit antennas of PR-MIMO into groups and antennas in different groups employ different radiation modes. Compared with conventional full-connected real-valued deep neural networks (DNN), the PR neural network which uses complex-valued coefficients can work directly in the complex domain. Experiment results show that the proposed channel extrapolation method offers significant performance gains in terms of extrapolation accuracy over benchmark schemes. △ Less

Submitted 6 April, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.05756 [pdf, other]

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Authors: Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

Abstract: Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is unce… ▽ More Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for develo** brain-controlled hearable technologies. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.08810 [pdf, other]

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Abstract: Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we pro… ▽ More Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2301.01940 [pdf, other]

Enabling Augmented Segmentation and Registration in Ultrasound-Guided Spinal Surgery via Realistic Ultrasound Synthesis from Diagnostic CT Volume

Authors: Ang Li, Jiayi Han, Yongjian Zhao, Keyu Li, Li Liu

Abstract: This paper aims to tackle the issues on unavailable or insufficient clinical US data and meaningful annotation to enable bone segmentation and registration for US-guided spinal surgery. While the US is not a standard paradigm for spinal surgery, the scarcity of intra-operative clinical US data is an insurmountable bottleneck in training a neural network. Moreover, due to the characteristics of US… ▽ More This paper aims to tackle the issues on unavailable or insufficient clinical US data and meaningful annotation to enable bone segmentation and registration for US-guided spinal surgery. While the US is not a standard paradigm for spinal surgery, the scarcity of intra-operative clinical US data is an insurmountable bottleneck in training a neural network. Moreover, due to the characteristics of US imaging, it is difficult to clearly annotate bone surfaces which causes the trained neural network missing its attention to the details. Hence, we propose an In silico bone US simulation framework that synthesizes realistic US images from diagnostic CT volume. Afterward, using these simulated bone US we train a lightweight vision transformer model that can achieve accurate and on-the-fly bone segmentation for spinal sonography. In the validation experiments, the realistic US simulation was conducted by deriving from diagnostic spinal CT volume to facilitate a radiation-free US-guided pedicle screw placement procedure. When it is employed for training bone segmentation task, the Chamfer distance achieves 0.599mm; when it is applied for CT-US registration, the associated bone segmentation accuracy achieves 0.93 in Dice, and the registration accuracy based on the segmented point cloud is 0.13~3.37mm in a complication-free manner. While bone US images exhibit strong echoes at the medium interface, it may enable the model indistinguishable between thin interfaces and bone surfaces by simply relying on small neighborhood information. To overcome these shortcomings, we propose to utilize a Long-range Contrast Learning Module to fully explore the Long-range Contrast between the candidates and their surrounding pixels. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: Submitted to IEEE Transactions on Automation Science and Engineering. Copyright may be transferred without notice, after which this version may no longer be accessible. Note that the abstract is shorter than that in the pdf file due to character limitations

arXiv:2212.14227 [pdf, other]

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

Authors: Yinghao Aaron Li, Cong Han, Nima Mesgarani

Abstract: One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning f… ▽ More One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: SLT 2022

arXiv:2211.16764 [pdf, other]

A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem

Authors: Andong Li, Guochen Yu, Chengshi Zheng, Wenzhe Liu, Xiaodong Li

Abstract: While deep neural networks have facilitated significant advancements in the field of speech enhancement, most existing methods are developed following either empirical or relatively blind criteria, lacking adequate guidelines in pipeline design. Inspired by Taylor's theorem, we propose a general unfolding framework for both single- and multi-channel speech enhancement tasks. Concretely, we formula… ▽ More While deep neural networks have facilitated significant advancements in the field of speech enhancement, most existing methods are developed following either empirical or relatively blind criteria, lacking adequate guidelines in pipeline design. Inspired by Taylor's theorem, we propose a general unfolding framework for both single- and multi-channel speech enhancement tasks. Concretely, we formulate the complex spectrum recovery into the spectral magnitude map** in the neighborhood space of the noisy mixture, in which an unknown sparse term is introduced and applied for phase modification in advance. Based on that, the map** function is decomposed into the superimposition of the 0th-order and high-order polynomials in Taylor's series, where the former coarsely removes the interference in the magnitude domain and the latter progressively complements the remaining spectral detail in the complex spectrum domain. In addition, we study the relation between adjacent order terms and reveal that each high-order term can be recursively estimated with its lower-order term, and each high-order term is then proposed to evaluate using a surrogate function with trainable weights so that the whole system can be trained in an end-to-end manner. Given that the proposed framework is devised based on Taylor's theorem, it possesses improved internal flexibility. Extensive experiments are conducted on WSJ0-SI84, DNS-Challenge, Voicebank+Demand, spatialized Librispeech, and L3DAS22 multi-channel speech enhancement challenge datasets. Quantitative results show that the proposed approach yields competitive performance over existing top-performing approaches in terms of multiple objective metrics. △ Less

Submitted 28 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: Submitted to TASLP, revised version, 17 pages

arXiv:2211.14818 [pdf, other]

Speeding-up Symbol-Level Precoding Using Separable and Dual Optimizations

Authors: Junwen Yang, Ang Li, Xuewen Liao, Christos Masouros

Abstract: Symbol-level precoding (SLP) manipulates the transmitted signals to accurately exploit the multi-user interference (MUI) in the multi-user downlink. This enables that all the resultant interference contributes to correct detection, which is the so-called constructive interference (CI). Its performance superiority comes at the cost of solving a nonlinear optimization problem on a symbol-by-symbol b… ▽ More Symbol-level precoding (SLP) manipulates the transmitted signals to accurately exploit the multi-user interference (MUI) in the multi-user downlink. This enables that all the resultant interference contributes to correct detection, which is the so-called constructive interference (CI). Its performance superiority comes at the cost of solving a nonlinear optimization problem on a symbol-by-symbol basis, for which the resulting complexity becomes prohibitive in realistic wireless communication systems. In this paper, we investigate low-complexity SLP algorithms for both phase-shift keying (PSK) and quadrature amplitude modulation (QAM). Specifically, we first prove that the max-min SINR balancing (SB) SLP problem for PSK signaling is not separable, which is contrary to the power minimization (PM) SLP problem, and accordingly, existing decomposition methods are not applicable. Next, we establish an explicit duality between the PM-SLP and SB-SLP problems for PSK modulation. The proposed duality facilitates obtaining the solution to the SB-SLP given the solution to the PM-SLP without the need for one-dimension search, and vice versa. We then propose a closed-form power scaling algorithm to solve the SB-SLP via PM-SLP to take advantage of the separability of the PM-SLP. As for QAM modulation, we convert the PM-SLP problem into a separable equivalent optimization problem, and decompose the new problem into several simple parallel subproblems with closed-form solutions, leveraging the proximal Jacobian alternating direction method of multipliers (PJ-ADMM). We further prove that the proposed duality can be generalized to the multi-level modulation case, based on which a power scaling parallel inverse-free algorithm is also proposed to solve the SB-SLP for QAM signaling. Numerical results show that the proposed algorithms offer optimal performance with lower complexity than the state-of-the-art. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: 30 pages, 11 figures

arXiv:2211.12024 [pdf, other]

TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective

Authors: Andong Li, Guochen Yu, Wenzhe Liu, Xiaodong Li, Chengshi Zheng

Abstract: Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer cal… ▽ More Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer called TaylorBM to simulate Taylor's series expansion operation in which the 0th-order term serves as a spatial filter to conduct the beam mixing, and several high-order terms are tasked with residual noise cancellation for post-processing. The whole system is devised to work in an end-to-end manner. Experiments are conducted on the spatialized LibriSpeech corpus and results show that the proposed approach outperforms existing advanced baselines in terms of evaluation metrics. △ Less

Submitted 30 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: In submission to ICASSP 2023, 5 pages

arXiv:2211.08636 [pdf, other]

Cooperative Energy and Time-Optimal Lane Change Maneuvers with Minimal Highway Traffic Disruption

Authors: Andres S. Chavez Armijos, Anni Li, Christos G. Cassandras, Yasir K. Al-Nadawi, Hidekazu Araki, Behdad Chalaki, Ehsan Moradi-Pari, Hossein Nourkhiz Mahjoub, Vaishnav Tadiparthi

Abstract: We derive optimal control policies for a Connected Automated Vehicle (CAV) and cooperating neighboring CAVs to carry out a lane change maneuver consisting of a longitudinal phase where the CAV properly positions itself relative to the cooperating neighbors and a lateral phase where it safely changes lanes. In contrast to prior work on this problem, where the CAV "selfishly" only seeks to minimize… ▽ More We derive optimal control policies for a Connected Automated Vehicle (CAV) and cooperating neighboring CAVs to carry out a lane change maneuver consisting of a longitudinal phase where the CAV properly positions itself relative to the cooperating neighbors and a lateral phase where it safely changes lanes. In contrast to prior work on this problem, where the CAV "selfishly" only seeks to minimize its maneuver time, we seek to ensure that the fast-lane traffic flow is minimally disrupted (through a properly defined metric). Additionally, when performing lane-changing maneuvers, we optimally select the cooperating vehicles from a set of feasible neighboring vehicles and experimentally show that the highway throughput is improved compared to the baseline case of human-driven vehicles changing lanes with no cooperation. When feasible solutions do not exist for a given maximal allowable disruption, we include a time relaxation method trading off a longer maneuver time with reduced disruption. Our analysis is also extended to multiple sequential maneuvers. Simulation results show the effectiveness of our controllers in terms of safety guarantees and up to 16% and 90% average throughput and maneuver time improvement respectively when compared to maneuvers with no cooperation. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.17102

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.01661 [pdf, other]

doi 10.3390/e25010146

Pairing optimization via statistics: Algebraic structure in pairing problems and its application to performance enhancement

Authors: Naoki Fujita, André Röhm, Takatomo Mihana, Ryoichi Horisaki, Aohan Li, Mikio Hasegawa, Makoto Naruse

Abstract: Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In our previous study, we proposed an efficient method to infer the underlying compatibilities among the entities, under the constraint that only the total compat… ▽ More Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In our previous study, we proposed an efficient method to infer the underlying compatibilities among the entities, under the constraint that only the total compatibility is observable. Furthermore, by transforming the pairing problem into a traveling salesman problem with a multi-layer architecture, a pairing optimization algorithm was successfully demonstrated to derive a high-total-compatibility pairing. However, there is substantial room for further performance enhancement by further exploiting the underlying mathematical properties. In this study, we prove the existence of algebraic structures in the pairing problem. We transform the initially estimated compatibility information into an equivalent form where the variance of the individual compatibilities is minimized. We then demonstrate that the total compatibility obtained when using the heuristic pairing algorithm on the transformed problem is significantly higher compared to the previous method. With this improved perspective on the pairing problem using fundamental mathematical properties, we can contribute to practical applications such as wireless communications beyond 5G, where efficient pairing is of critical importance. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Showing 1–50 of 131 results for author: Li, A