Search | arXiv e-print repository

arXiv:2407.00933 [pdf, other]

Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can cause outages during the task offloading. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) whose computationally capable metamaterials are leveraged to jointly enable V2I reflective links as well as to implement interference cancellation at the V2V links. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices to maximize an autonomous driving safety task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems solved via an alternate approximation method. Our simulation results showcase that the proposed RICS-assisted offloading framework significantly improves the safety of the considered autonomous driving network, yielding a nearly 34\% improvement in the safety coefficient of the CVs. In addition, it is demonstrated that the V2V data rate can be improved by around 60\% indicating that the RICS-induced adjustment of the signals can effectively mitigate interference at the V2V link. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19959 [pdf, other]

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge this simulation-to-real gap, this paper presents a new relatively large-scale Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset. The proposed dataset is valuable in two aspects: 1) benchmarking speech enhancement and localization algorithms in real scenarios; 2) offering a substantial amount of real-world training data for potentially improving the performance of real-world applications. Specifically, a 32-channel array with high-fidelity microphones is used for recording. A loudspeaker is used for playing source speech signals. A total of 83-hour speech signals (48 hours for static speaker and 35 hours for moving speaker) are recorded in 32 different scenes, and 144 hours of background noise are recorded in 31 different scenes. Both speech and noise recording scenes cover various common indoor, outdoor, semi-outdoor and transportation environments, which enables the training of general-purpose speech enhancement and source localization networks. To obtain the task-specific annotations, the azimuth angle of the loudspeaker is annotated with an omni-direction fisheye camera by automatically detecting the loudspeaker. The direct-path signal is set as the target clean speech for speech enhancement, which is obtained by filtering the source speech signal with an estimated direct-path propagation filter. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18055 [pdf, other]

Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification

Authors: Kaining Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Mérouane Debbah, Chau Yuen

Abstract: The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b… ▽ More The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-band signals has been largely overlooked, leading to RF spectrum pollution and degradation of wireless transmissions. To address this issue, we propose a novel surface structure called the Filtering Reconfigurable Intelligent Computational Surface (FRICS). We introduce two types of FRICS structures: one that dynamically reflects resonance band signals through a tunable spatial filter while absorbing out-of-band signals using metamaterials and the other one that dynamically amplifies in-band signals using computational metamaterials while reflecting out-of-band signals. To evaluate the performance of FRICS, we implement it in device-to-device (D2D) communication and vehicular-to-everything (V2X) scenarios. The experiments demonstrate the superiority of FRICS in signal-to-interference-noise ratio (SINR) and energy efficiency (EE). Finally, we discuss the critical challenges faced and promising techniques for implementing FRICS in future wireless systems. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.13335 [pdf, other]

AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations

Authors: Xuelin Cao, Bo Yang, Kaining Wang, Xinghua Li, Zhiwen Yu, Chau Yuen, Yan Zhang, Zhu Han

Abstract: With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz… ▽ More With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12447 [pdf, other]

Text-aware Speech Separation for Multi-talker Keyword Spotting

Authors: Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

Abstract: For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To ad… ▽ More For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To address it, this paper proposes a novel Text-aware Permutation Determinization Training method for multi-talker KWS with a clue-based Speech Separation front-end (TPDT-SS). Our research highlights the critical role of SS front-ends and shows that incorporating keyword-specific clues into these models can greatly enhance the effectiveness. TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend. Additionally, fine-tuning our system on unseen mixed speech results in further performance improvement. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.11546 [pdf, other]

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired speech and text data. GigaSpeech 2 comprises about 30,000 hours of automatically transcribed speech, including Thai, Indonesian, and Vietnamese, gathered from unlabeled YouTube videos. We also introduce an automated pipeline for data crawling, transcription, and label refinement. Specifically, this pipeline uses Whisper for initial transcription and TorchAudio for forced alignment, combined with multi-dimensional filtering for data quality assurance. A modified Noisy Student Training is developed to further refine flawed pseudo labels iteratively, thus enhancing model performance. Experimental results on our manually transcribed evaluation set and two public test sets from Common Voice and FLEURS confirm our corpus's high quality and broad applicability. Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to the Whisper large-v3 model, with merely 10% model parameters. Furthermore, our ASR models trained on Gigaspeech 2 yield superior performance compared to commercial services. We believe that our newly introduced corpus and pipeline will open a new avenue for low-resource speech recognition and significantly facilitate research in this area. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.03391 [pdf, other]

Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Authors: Chunjie Wang, Xuhui Zhang, Huijun Xing, Liang Xue, Shuqiang Wang, Yanyan Shen, Bo Yang, ** Guan

Abstract: Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) scheme… ▽ More Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) schemes. The aim is to optimize the system weighted energy efficiency (EE) under the above two schemes, respectively. Specifically, the Dinkelbach, branch and bound, successive convex approximation, and the semidefinite relaxation methods are exploited within the alternating optimization framework to obtain effective solutions to the considered problems. The numerical findings indicate that the EIA scheme exhibits better performance compared to the OIA scheme in diverse scenarios when considering the weighted EE, and the proposed algorithm demonstrates superior performance in comparison to the baseline algorithms. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.07021 [pdf, other]

IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization

Authors: Yabo Wang, Bing Yang, Xiaofei Li

Abstract: Extracting direct-path spatial feature is crucial for sound source localization in adverse acoustic environments. This paper proposes the IPDnet, a neural network that estimates direct-path inter-channel phase difference (DP-IPD) of sound sources from microphone array signals. The estimated DP-IPD can be easily translated to source location based on the known microphone array geometry. First, a fu… ▽ More Extracting direct-path spatial feature is crucial for sound source localization in adverse acoustic environments. This paper proposes the IPDnet, a neural network that estimates direct-path inter-channel phase difference (DP-IPD) of sound sources from microphone array signals. The estimated DP-IPD can be easily translated to source location based on the known microphone array geometry. First, a full-band and narrow-band fusion network is proposed for DP-IPD estimation, in which alternating narrow-band and full-band layers are responsible for estimating the rough DP-IPD information in one frequency band and capturing the frequency correlations of DP-IPD, respectively. Second, a new multi-track DP-IPD learning target is proposed for the localization of flexible number of sound sources. Third, the IPDnet is extend to handling variable microphone arrays, once trained which is able to process arbitrary microphone arrays with different number of channels and array topology. Experiments of multiple-moving-speaker localization are conducted on both simulated and real-world data, which show that the proposed full-band and narrow-band fusion network and the proposed multi-track DP-IPD learning target together achieves excellent sound source localization performance. Moreover, the proposed variable-array model generalizes well to unseen microphone arrays. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.13786 [pdf, other]

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, **gfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.07215 [pdf, other]

Computation Offloading for Multi-server Multi-access Edge Vehicular Networks: A DDQN-based Method

Authors: Siyu Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Yan Zhang, Chau Yuen

Abstract: In this paper, we investigate a multi-user offloading problem in the overlap** domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the hum… ▽ More In this paper, we investigate a multi-user offloading problem in the overlap** domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the human behaviour model when making the offloading decision, and then introduce a server evaluation mechanism based on both the mobility parameter and the server load to select the optimal offloading server. In order to fully utilise the server resources, we design a double deep Q-network (DDQN)-based reward evaluation algorithm that considers the priority of tasks when scheduling offload requests. Finally, numerical simulations are conducted to verify that our proposed method outperforms traditional mathematical computation methods as well as the DQN algorithm. △ Less

Submitted 20 February, 2024; originally announced April 2024.

arXiv:2403.13332 [pdf, other]

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Authors: Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Abstract: Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention. Existing KWS search algorithms typically follow a frame-synchronous approach, where search decisions are made repeatedly at each frame despite the fact that most frames are keyword-irrelevant. In this paper, we propose TDT… ▽ More Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention. Existing KWS search algorithms typically follow a frame-synchronous approach, where search decisions are made repeatedly at each frame despite the fact that most frames are keyword-irrelevant. In this paper, we propose TDT-KWS, which leverages token-and-duration Transducers (TDT) for KWS tasks. We also propose a novel KWS task-specific decoding algorithm for Transducer-based models, which supports highly effective frame-asynchronous keyword search in streaming speech scenarios. With evaluations conducted on both the public Hey Snips and self-constructed LibriKWS-20 datasets, our proposed KWS-decoding algorithm produces more accurate results than conventional ASR decoding algorithms. Additionally, TDT-KWS achieves on-par or better wake word detection performance than both RNN-T and traditional TDT-ASR systems while achieving significant inference speed-up. Furthermore, experiments show that TDT-KWS is more robust to noisy environments compared to RNN-T KWS. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted by ICASSP2024

arXiv:2402.15932 [pdf, other]

Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approach

Authors: Alaa Selim, Yanzhu Ye, Junbo Zhao, Bo Yang

Abstract: In the rapidly evolving domain of electrical power systems, the Volt-VAR optimization (VVO) is increasingly critical, especially with the burgeoning integration of renewable energy sources. Traditional approaches to learning-based VVO in expansive and dynamically changing power systems are often hindered by computational complexities. To address this challenge, our research presents a novel framew… ▽ More In the rapidly evolving domain of electrical power systems, the Volt-VAR optimization (VVO) is increasingly critical, especially with the burgeoning integration of renewable energy sources. Traditional approaches to learning-based VVO in expansive and dynamically changing power systems are often hindered by computational complexities. To address this challenge, our research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL), specifically utilizing the Importance Weighted Actor-Learner Architecture (IMPALA) algorithm, executed on the RAY platform. This framework, built upon RLlib-an industry-standard in Reinforcement Learning-ingeniously capitalizes on the distributed computing capabilities and advanced hyperparameter tuning offered by RAY. This design significantly expedites the exploration and exploitation phases in the VVO solution space. Our empirical results demonstrate that our approach not only surpasses existing DRL methods in achieving superior reward outcomes but also manifests a remarkable tenfold reduction in computational requirements. The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control. RLlib-IMPALA leverages RAY's toolkit to enhance analytical capabilities and significantly speeds up training to become more than 10 times faster than other state-of-the-art DRL methods. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.04584 [pdf, other]

Troublemaker Learning for Low-Light Image Enhancement

Authors: Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

Abstract: Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for… ▽ More Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for training. TML is simple: we first dim the input and then increase its brightness. TML is based on two core components. First, the troublemaker model (TM) constructs pseudo low-light images from normal images to relieve the cost of pairwise data. Second, the predicting model (PM) enhances the brightness of pseudo low-light images. Additionally, we incorporate an enhancing model (EM) to further improve the visual performance of PM outputs. Moreover, in LLIE tasks, characterizing global element correlations is important because more information on the same object can be captured. CNN cannot achieve this well, and self-attention has high time complexity. Accordingly, we propose Global Dynamic Convolution (GDC) with O(n) time complexity, which essentially imitates the partial calculation process of self-attention to formulate elementwise correlations. Based on the GDC module, we build the UGDC model. Extensive quantitative and qualitative experiments demonstrate that UGDC trained with TML can achieve competitive performance against state-of-the-art approaches on public datasets. The code is available at https://github.com/Rainbowman0/TML_LLIE. △ Less

Submitted 2 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.00398 [pdf, other]

Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks

Authors: Bo Yang, Xueyao Zhang, Zhiwen Yu, Xuelin Cao, Chongwen Huang, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

Abstract: In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from sever… ▽ More In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from severe interference, causing outages in the task offloading process. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) to enable, not only V2I reflective links, but also interference cancellation at the V2V links exploiting the computational capability of its metamaterials. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices, with the objective to maximize a safety-based autonomous driving task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems and solved via an alternate approximation method. Our simulation results demonstrate the effectiveness of the proposed RICS optimization in improving the safety in autonomous driving networks. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.12546 [pdf, other]

On Building Myopic MPC Policies using Supervised Learning

Authors: Christopher A. Orrico, Bokan Yang, Dinesh Krishnamoorthy

Abstract: The application of supervised learning techniques in combination with model predictive control (MPC) has recently generated significant interest, particularly in the area of approximate explicit MPC, where function approximators like deep neural networks are used to learn the MPC policy via optimal state-action pairs generated offline. While the aim of approximate explicit MPC is to closely replic… ▽ More The application of supervised learning techniques in combination with model predictive control (MPC) has recently generated significant interest, particularly in the area of approximate explicit MPC, where function approximators like deep neural networks are used to learn the MPC policy via optimal state-action pairs generated offline. While the aim of approximate explicit MPC is to closely replicate the MPC policy, substituting online optimization with a trained neural network, the performance guarantees that come with solving the online optimization problem are typically lost. This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy. This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon, such that the online computation burden reduces significantly without affecting the controller performance. This approach differs from existing work on value function approximations in the sense that it learns the cost-to-go function by using offline-collected state-value pairs, rather than closed-loop performance data. The cost of generating the state-value pairs used for training is addressed using a sensitivity-based data augmentation scheme. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.06485 [pdf, other]

Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

Authors: Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

Abstract: Customizable keyword spotting (KWS) in continuous speech has attracted increasing attention due to its real-world application potential. While contrastive learning (CL) has been widely used to extract keyword representations, previous CL approaches all operate on pre-segmented isolated words and employ only audio-text representations matching strategy. However, for KWS in continuous speech, co-art… ▽ More Customizable keyword spotting (KWS) in continuous speech has attracted increasing attention due to its real-world application potential. While contrastive learning (CL) has been widely used to extract keyword representations, previous CL approaches all operate on pre-segmented isolated words and employ only audio-text representations matching strategy. However, for KWS in continuous speech, co-articulation and streaming word segmentation can easily yield similar audio patterns for different texts, which may consequently trigger false alarms. To address this issue, we propose a novel CL with Audio Discrimination (CLAD) approach to learning keyword representation with both audio-text matching and audio-audio discrimination ability. Here, an InfoNCE loss considering both audio-audio and audio-text CL data pairs is employed for each sliding window during training. Evaluations on the open-source LibriPhrase dataset show that the use of sliding-window level InfoNCE loss yields comparable performance compared to previous CL approaches. Furthermore, experiments on the continuous speech dataset LibriSpeech demonstrate that, by incorporating audio discrimination, CLAD achieves significant performance gain over CL without audio discrimination. Meanwhile, compared to two-stage KWS approaches, the end-to-end KWS with CLAD achieves not only better performance, but also significant speed-up. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP2024

arXiv:2312.14473 [pdf, other]

Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers

Authors: Yangjun Zeng, Buxiang Zhou, Jie Zhu, Jiarong Li, Bosen Yang, ** Lin, Yiwei Qiu

Abstract: Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr… ▽ More Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Improper scheduling may result in excessive reactive power demand, causing voltage violations and increased network losses, compromising safety and economy. To address these challenges, this paper first explores trade-offs between the efficiency and the reactive load of the electrolyzers. Subsequently, we propose a coordinated approach for scheduling the active and reactive power in the ReP2H system. A mixed-integer second-order cone programming (MISOCP) is established to jointly optimize active and reactive power by coordinating the ELZs, renewable energy sources, energy storage (ES), and var compensations. Case studies demonstrate that the proposed method reduces losses by 3.06% in an off-grid ReP2H system while increasing hydrogen production by 5.27% in average. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.12795 [pdf, ps, other]

doi 10.1109/TSG.2023.3326928

Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters

Authors: Dafeng Zhu, Bo Yang, Yu Wu, Haoran Deng, Zhaoyang Dong, Kai Ma, ** Guan

Abstract: This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-… ▽ More This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-to-gas further consume renewable energy and reduce carbon emissions. Aiming at carbon emissions disparities and supply-demand imbalances, this paper proposes a carbon trading ladder reward and punishment mechanism and an energy trading and scheduling method based on Lyapunov optimization and matching game to maximize the long-term benefits of each industrial cluster without knowing the prior information of random variables. Case studies show that our proposed trading method can reduce overall costs and carbon emissions while relieving energy pressure, which is important for Environmental, Social and Governance (ESG). △ Less

Submitted 20 December, 2023; originally announced December 2023.

Journal ref: IEEE Transactions on Smart Grid, 2023

arXiv:2312.12789 [pdf, other]

SLP-Net:An efficient lightweight network for segmentation of skin lesions

Authors: Bo Yang, Hong Peng, Chenggang Guo, Xiaohui Luo, Jun Wang, Xianzhong Long

Abstract: Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hard… ▽ More Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority △ Less

Submitted 4 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.07631 [pdf, other]

AI-driven projection tomography with multicore fibre-optic cell rotation

Authors: Jiawei Sun, Bin Yang, Nektarios Koukourakis, Jochen Guck, Juergen W. Czarske

Abstract: Optical tomography has emerged as a non-invasive imaging method, providing three-dimensional insights into subcellular structures and thereby enabling a deeper understanding of cellular functions, interactions, and processes. Conventional optical tomography methods are constrained by a limited illumination scanning range, leading to anisotropic resolution and incomplete imaging of cellular structu… ▽ More Optical tomography has emerged as a non-invasive imaging method, providing three-dimensional insights into subcellular structures and thereby enabling a deeper understanding of cellular functions, interactions, and processes. Conventional optical tomography methods are constrained by a limited illumination scanning range, leading to anisotropic resolution and incomplete imaging of cellular structures. To overcome this problem, we employ a compact multi-core fibre-optic cell rotator system that facilitates precise optical manipulation of cells within a microfluidic chip, achieving full-angle projection tomography with isotropic resolution. Moreover, we demonstrate an AI-driven tomographic reconstruction workflow, which can be a paradigm shift from conventional computational methods, often demanding manual processing, to a fully autonomous process. The performance of the proposed cell rotation tomography approach is validated through the three-dimensional reconstruction of cell phantoms and HL60 human cancer cells. The versatility of this learning-based tomographic reconstruction workflow paves the way for its broad application across diverse tomographic imaging modalities, including but not limited to flow cytometry tomography and acoustic rotation tomography. Therefore, this AI-driven approach can propel advancements in cell biology, aiding in the inception of pioneering therapeutics, and augmenting early-stage cancer diagnostics. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 15 pages, 6 figures

arXiv:2312.00476 [pdf, other]

Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer

Authors: Bing Yang, Xiaofei Li

Abstract: Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality generalization problem due to the mismatch between simulated and real-world acoustic characteristics and the deficiency of annotated real-world data. To this e… ▽ More Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality generalization problem due to the mismatch between simulated and real-world acoustic characteristics and the deficiency of annotated real-world data. To this end, this work proposes a self-supervised method that takes full advantage of unlabeled data for spatial acoustic parameter estimation. First, a new pretext task, i.e. cross-channel signal reconstruction (CCSR), is designed to learn a universal spatial acoustic representation from unlabeled multi-channel microphone signals. We mask partial signals of one channel and ask the model to reconstruct them, which makes it possible to learn spatial acoustic information from unmasked signals and extract source information from the other microphone channel. An encoder-decoder structure is used to disentangle the two kinds of information. By fine-tuning the pre-trained spatial encoder with a small annotated dataset, this encoder can be used to estimate spatial acoustic parameters. Second, a novel multi-channel audio Conformer (MC-Conformer) is adopted as the encoder model architecture, which is suitable for both the pretext and downstream tasks. It is carefully designed to be able to capture the local and global characteristics of spatial acoustics exhibited in the time-frequency domain. Experimental results of five acoustic parameter estimation tasks on both simulated and real-world data show the effectiveness of the proposed method. To the best of our knowledge, this is the first self-supervised learning method in the field of spatial acoustic representation learning and multi-channel audio signal processing. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.18520 [pdf, other]

Calibration-free online test-time adaptation for electroencephalography motor imagery decoding

Authors: Martin Wimpff, Mario Döbler, Bin Yang

Abstract: Providing a promising pathway to link the human brain with external devices, Brain-Computer Interfaces (BCIs) have seen notable advancements in decoding capabilities, primarily driven by increasingly sophisticated techniques, especially deep learning. However, achieving high accuracy in real-world scenarios remains a challenge due to the distribution shift between sessions and subjects. In this pa… ▽ More Providing a promising pathway to link the human brain with external devices, Brain-Computer Interfaces (BCIs) have seen notable advancements in decoding capabilities, primarily driven by increasingly sophisticated techniques, especially deep learning. However, achieving high accuracy in real-world scenarios remains a challenge due to the distribution shift between sessions and subjects. In this paper we will explore the concept of online test-time adaptation (OTTA) to continuously adapt the model in an unsupervised fashion during inference time. Our approach guarantees the preservation of privacy by eliminating the requirement to access the source data during the adaptation process. Additionally, OTTA achieves calibration-free operation by not requiring any session- or subject-specific data. We will investigate the task of electroencephalography (EEG) motor imagery decoding using a lightweight architecture together with different OTTA techniques like alignment, adaptive batch normalization, and entropy minimization. We examine two datasets and three distinct data settings for a comprehensive analysis. Our adaptation methods produce state-of-the-art results, potentially instigating a shift in transfer learning for BCI decoding towards online adaptation. △ Less

Submitted 8 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures, 12th International Winter Conference on Brain-Computer Interface 2024

arXiv:2310.04231 [pdf, other]

Indoor Positioning based on Active Radar Sensing and Passive Reflectors: Concepts & Initial Results

Authors: Pascal Schlachter, Zhibin Yu, Naveed Iqbal, Xiaofeng Wu, Sven Hinderer, Bin Yang

Abstract: To navigate reliably in indoor environments, an industrial autonomous vehicle must know its position. However, current indoor vehicle positioning technologies either lack accuracy, usability or are too expensive. Thus, we propose a novel concept called local reference point assisted active radar positioning, which is able to overcome these drawbacks. It is based on distributing passive retroreflec… ▽ More To navigate reliably in indoor environments, an industrial autonomous vehicle must know its position. However, current indoor vehicle positioning technologies either lack accuracy, usability or are too expensive. Thus, we propose a novel concept called local reference point assisted active radar positioning, which is able to overcome these drawbacks. It is based on distributing passive retroreflectors in the indoor environment such that each position of the vehicle can be identified by a unique reflection characteristic regarding the reflectors. To observe these characteristics, the autonomous vehicle is equipped with an active radar system. On one hand, this paper presents the basic idea and concept of our new approach towards indoor vehicle positioning and especially focuses on the crucial placement of the reflectors. On the other hand, it also provides a proof of concept by conducting a full system simulation including the placement of the local reference points, the radar-based distance estimation and the comparison of two different positioning methods. It successfully demonstrates the feasibility of our proposed approach. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted as a work-in-progress paper at the 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN 2023)

Journal ref: Proceedings of the Work-in-Progress Papers at the 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2023), September 25 - 28, 2023, Nuremberg, Germany (https://ceur-ws.org/Vol-3581/)

arXiv:2309.14274 [pdf]

Analysis and Experimental Validation of the WPT Efficiency of the Both-Sides Retrodirective System

Authors: Charleston Dale M. Ambatali, Shinichi Nakasuka, Bo Yang, Naoki Shinohara

Abstract: The retrodirective antenna array is considered as a mechanism to enable target tracking of a power receiver for long range wireless power transfer (WPT) due to its simplicity in implementation using only analog circuits. By installing the retrodirective capability on both the generator and rectenna arrays, a feedback loop that produces a high efficiency WPT channel is created. In this paper, we ch… ▽ More The retrodirective antenna array is considered as a mechanism to enable target tracking of a power receiver for long range wireless power transfer (WPT) due to its simplicity in implementation using only analog circuits. By installing the retrodirective capability on both the generator and rectenna arrays, a feedback loop that produces a high efficiency WPT channel is created. In this paper, we characterize the dynamics of this phenomenon using a discrete-time state-space model based on S-parameters and show that the system can naturally achieve maximum theoretical WPT efficiency. We further confirmed the theoretical analysis through a hardware experiment using a 12-port circuit board with measurable S-parameters mimicking a static wireless channel. The results collected from the hardware experiment show agreement with the proposed theoretical framework by comparing the theoretical efficiency with the measured efficiency and by showing that the collected data points follow the predicted condition to achieve maximum efficiency. △ Less

Submitted 27 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: This current version has been submitted to the Space Solar Power and Wireless Transmission on February 19, 2024 for possible publication. Compared to the previous version, this version is a major revision discussing existing works more thoroughly to the proposed idea and also adding more detail to the experiment setup so it can be reproducible

arXiv:2309.13515 [pdf, other]

Learning-based Inverse Perception Contracts and Applications

Authors: Dawei Sun, Benjamin C. Yang, Sayan Mitra

Abstract: Perception modules are integral in many modern autonomous systems, but their accuracy can be subject to the vagaries of the environment. In this paper, we propose a learning-based approach that can automatically characterize the error of a perception module from data and use this for safe control. The proposed approach constructs an inverse perception contract (IPC) which generates a set that cont… ▽ More Perception modules are integral in many modern autonomous systems, but their accuracy can be subject to the vagaries of the environment. In this paper, we propose a learning-based approach that can automatically characterize the error of a perception module from data and use this for safe control. The proposed approach constructs an inverse perception contract (IPC) which generates a set that contains the ground-truth value that is being estimated by the perception module, with high probability. We apply the proposed approach to study a vision pipeline deployed on a quadcopter. With the proposed approach, we successfully constructed an IPC for the vision pipeline. We then designed a control algorithm that utilizes the learned IPC, with the goal of landing the quadcopter safely on a landing pad. Experiments show that with the learned IPC, the control algorithm safely landed the quadcopter despite the error from the perception module, while the baseline algorithm without using the learned IPC failed to do so. △ Less

Submitted 3 March, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.05964 [pdf, other]

Massive Access of Static and Mobile Users via Reconfigurable Intelligent Surfaces: Protocol Design and Performance Analysis

Authors: Xuelin Cao, Bo Yang, Chongwen Huang, George C. Alexandropoulos, Chau Yuen, Zhu Han, H. Vincent Poor, Lajos Hanzo

Abstract: The envisioned wireless networks of the future entail the provisioning of massive numbers of connections, heterogeneous data traffic, ultra-high spectral efficiency, and low latency services. This vision is spurring research activities focused on defining a next generation multiple access (NGMA) protocol that can accommodate massive numbers of users in different resource blocks, thereby, achieving… ▽ More The envisioned wireless networks of the future entail the provisioning of massive numbers of connections, heterogeneous data traffic, ultra-high spectral efficiency, and low latency services. This vision is spurring research activities focused on defining a next generation multiple access (NGMA) protocol that can accommodate massive numbers of users in different resource blocks, thereby, achieving higher spectral efficiency and increased connectivity compared to conventional multiple access schemes. In this article, we present a multiple access scheme for NGMA in wireless communication systems assisted by multiple reconfigurable intelligent surfaces (RISs). In this regard, considering the practical scenario of static users operating together with mobile ones, we first study the interplay of the design of NGMA schemes and RIS phase configuration in terms of efficiency and complexity. Based on this, we then propose a multiple access framework for RIS-assisted communication systems, and we also design a medium access control (MAC) protocol incorporating RISs. In addition, we give a detailed performance analysis of the designed RIS-assisted MAC protocol. Our extensive simulation results demonstrate that the proposed MAC design outperforms the benchmarks in terms of system throughput and access fairness, and also reveal a trade-off relationship between the system throughput and fairness. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.01168 [pdf, other]

Noise Measurement of a Wind Turbine using Thick Blades with Blunt Trailing Edge

Authors: Weicheng Xue, Bing Yang

Abstract: The noise generated by wind turbines can potentially cause significant harm to the ecological environment and the living conditions of residents. Therefore, a proper assessment of wind turbine noise is crucial. The IEC 61400-11 standard provides standardized guidelines for measuring turbine noise, facilitating the comparison of noise characteristics among different wind turbine models. This work a… ▽ More The noise generated by wind turbines can potentially cause significant harm to the ecological environment and the living conditions of residents. Therefore, a proper assessment of wind turbine noise is crucial. The IEC 61400-11 standard provides standardized guidelines for measuring turbine noise, facilitating the comparison of noise characteristics among different wind turbine models. This work aims to conduct a comprehensive noise measurement of a 100kW wind turbine using thick blades with blunt trailing edge, which differs from the typical turbines studied previously. The work takes into account the unique design and dynamic characteristics of small-scale wind turbines and adjusts the measurement accordingly, with deviations from the IEC standards will be explicitly addressed. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2309.00907 [pdf, other]

A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading

Authors: Ruihuai Liang, Bo Yang, Zhiwen Yu, Xuelin Cao, Derrick Wing Kwan Ng, Chau Yuen

Abstract: Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether o… ▽ More Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether offloading or not) and computational resource allocation of MEC. The design can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is generally NP-hard and its effective solution can be obtained by performing online inference through a well-trained deep neural network (DNN) model. However, when the system environments change dynamically, the DNN model may lose efficacy due to the drift of input parameters, thereby decreasing the generalization ability of the DNN model. To address this unique challenge, in this paper, we propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs). Specifically, the shared backbone will be invariant during the PHs training and the inferred results will be ensembled, thereby significantly reducing the required training overhead and improving the inference performance. As a result, the joint optimization problem for offloading decision and resource allocation can be efficiently solved even in a time-varying wireless environment. Experimental results show that the proposed MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2308.06958 [pdf, other]

Hydrogen Supply Infrastructure Network Planning Approach towards Chicken-egg Conundrum

Authors: Haoran Deng, Bo Yang, Mo-Yuen Chow, Gang Yao, Cailian Chen, ** Guan

Abstract: In the early commercialization stage of hydrogen fuel cell vehicles (HFCVs), reasonable hydrogen supply infrastructure (HSI) planning decisions is a premise for promoting the popularization of HFCVs. However, there is a strong causality between HFCVs and hydrogen refueling stations (HRSs): the planning decisions of HRSs could affect the hydrogen refueling demand of HFCVs, and the growth of demand… ▽ More In the early commercialization stage of hydrogen fuel cell vehicles (HFCVs), reasonable hydrogen supply infrastructure (HSI) planning decisions is a premise for promoting the popularization of HFCVs. However, there is a strong causality between HFCVs and hydrogen refueling stations (HRSs): the planning decisions of HRSs could affect the hydrogen refueling demand of HFCVs, and the growth of demand would in turn stimulate the further investment in HRSs, which is also known as the ``chicken and egg'' conundrum. Meanwhile, the hydrogen demand is uncertain with insufficient prior knowledge, and thus there is a decision-dependent uncertainty (DDU) in the planning issue. This poses great challenges to solving the optimization problem. To this end, this work establishes a multi-network HSI planning model coordinating hydrogen, power, and transportation networks. Then, to reflect the causal relationship between HFCVs and HRSs effectively without sufficient historical data, a distributionally robust optimization framework with decision-dependent uncertainty is developed. The uncertainty of hydrogen demand is modeled as a Wasserstein ambiguity set with a decision-dependent empirical probability distribution. Subsequently, to reduce the computational complexity caused by the introduction of a large number of scenarios and high-dimensional nonlinear constraints, we developed an improved distribution sha** method and techniques of scenario and variable reduction to derive the solvable form with less computing burden. Finally, the simulation results demonstrate that this method can reduce costs by at least 10.4% compared with traditional methods and will be more effective in large-scale HSI planning issues. Further, we put forward effective suggestions for the policymakers and investors to formulate relevant policies and decisions. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2306.14276 [pdf, other]

Aeroacoustic Source Localization

Authors: Weicheng Xue, Bing Yang, Shaohong Jia

Abstract: The deconvolutional DAMAS algorithm can effectively eliminate the misconceptions in the usually-used beamforming localization algorithm, allowing for more accurate calculation of the source location as well as the intensity. When solving a linear system of equations, the DAMAS algorithm takes into account the mutual influence of different locations, reducing or even eliminating sidelobes and produ… ▽ More The deconvolutional DAMAS algorithm can effectively eliminate the misconceptions in the usually-used beamforming localization algorithm, allowing for more accurate calculation of the source location as well as the intensity. When solving a linear system of equations, the DAMAS algorithm takes into account the mutual influence of different locations, reducing or even eliminating sidelobes and producing more accurate results. This work first introduces the principles of the DAMAS algorithm. Then it applies both the beamforming algorithm and the DAMAS algorithm to simulate the localization of a single-frequency source from a 1.5 MW wind turbine, a complex line source with the text "UCAS" and a line source downstream of an airfoil trailing edge. Finally, the work presents experimental localization results of the source of a 1.5 MW wind turbine using both the beamforming algorithm and the DAMAS algorithm. △ Less

Submitted 5 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.12604 [pdf, other]

Consecutive Inertia Drift of Autonomous RC Car via Primitive-based Planning and Data-driven Control

Authors: Yiwen Lu, Bo Yang, Jiayun Li, Yihan Zhou, Hongshuai Chen, Yilin Mo

Abstract: Inertia drift is an aggressive transitional driving maneuver, which is challenging due to the high nonlinearity of the system and the stringent requirement on control and planning performance. This paper presents a solution for the consecutive inertia drift of an autonomous RC car based on primitive-based planning and data-driven control. The planner generates complex paths via the concatenation o… ▽ More Inertia drift is an aggressive transitional driving maneuver, which is challenging due to the high nonlinearity of the system and the stringent requirement on control and planning performance. This paper presents a solution for the consecutive inertia drift of an autonomous RC car based on primitive-based planning and data-driven control. The planner generates complex paths via the concatenation of path segments called primitives, and the controller eases the burden on feedback by interpolating between multiple real trajectories with different initial conditions into one near-feasible reference trajectory. The proposed strategy is capable of drifting through various paths containing consecutive turns, which is validated in both simulation and reality. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 9 pages, 10 figures, to appear to IROS 2023

arXiv:2305.19610 [pdf, other]

FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization

Authors: Yabo Wang, Bing Yang, Xiaofei Li

Abstract: Extracting direct-path spatial features is critical for sound source localization in adverse acoustic environments. This paper proposes a full-band and narrow-band fusion network for estimating direct-path inter-channel phase difference (DP-IPD) from microphone signals. The alternating full-band and narrow-band layers are responsible for learning the full-band correlation and narrow-band extractio… ▽ More Extracting direct-path spatial features is critical for sound source localization in adverse acoustic environments. This paper proposes a full-band and narrow-band fusion network for estimating direct-path inter-channel phase difference (DP-IPD) from microphone signals. The alternating full-band and narrow-band layers are responsible for learning the full-band correlation and narrow-band extraction of DP-IPD, respectively. Experiments show that the proposed network noticeably outperforms other advanced methods on both simulated and real-world data. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.17937 [pdf, other]

Attention Mechanisms in Medical Image Segmentation: A Survey

Authors: Yutong Xie, Bing Yang, Qingbiao Guan, Jianpeng Zhang, Qi Wu, Yong Xia

Abstract: Medical image segmentation plays an important role in computer-aided diagnosis. Attention mechanisms that distinguish important parts from irrelevant parts have been widely used in medical image segmentation tasks. This paper systematically reviews the basic principles of attention mechanisms and their applications in medical image segmentation. First, we review the basic concepts of attention mec… ▽ More Medical image segmentation plays an important role in computer-aided diagnosis. Attention mechanisms that distinguish important parts from irrelevant parts have been widely used in medical image segmentation tasks. This paper systematically reviews the basic principles of attention mechanisms and their applications in medical image segmentation. First, we review the basic concepts of attention mechanism and formulation. Second, we surveyed over 300 articles related to medical image segmentation, and divided them into two groups based on their attention mechanisms, non-Transformer attention and Transformer attention. In each group, we deeply analyze the attention mechanisms from three aspects based on the current literature work, i.e., the principle of the mechanism (what to use), implementation methods (how to use), and application tasks (where to use). We also thoroughly analyzed the advantages and limitations of their applications to different tasks. Finally, we summarize the current state of research and shortcomings in the field, and discuss the potential challenges in the future, including task specificity, robustness, standard evaluation, etc. We hope that this review can showcase the overall research context of traditional and Transformer attention methods, provide a clear reference for subsequent research, and inspire more advanced attention research, not only in medical image segmentation, but also in other image analysis scenarios. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Submitted to Medical Image Analysis, survey paper, 34 pages, over 300 references

arXiv:2305.09647 [pdf, other]

Wavelet-based Unsupervised Label-to-Image Translation

Authors: George Eskandar, Mohamed Abdelsamad, Karim Armanious, Shuai Zhang, Bin Yang

Abstract: Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a semantic layout is used to generate a photorealistic image. State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts a… ▽ More Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a semantic layout is used to generate a photorealistic image. State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and learn correspondences in appearance instead of semantic content. Starting from the assumption that a high quality generated image should be segmented back to its semantic layout, we propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. Furthermore, in order to match the high-frequency distribution of real images, a novel generator architecture in the wavelet domain is proposed. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.14715

arXiv:2305.00170 [pdf, other]

Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

Authors: Peng Fan, Dongyue Guo, JianWei Zhang, Bo Yang, Yi Lin

Abstract: Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train… ▽ More Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train a language identifier (LID), that based on a recurrent neural network (RNN) to obtain sentence language identification in the form of one-hot encoding. The second stage aims to train an RNN-based end-to-end multilingual recognition model that utilizes sentence language features generated by LID to enhance input features. In this work, We introduce Featurewise Linear Modulation (FiLM) to improve the performance of multilingual ASR by utilizing sentence language identification. Furthermore, we introduce a new sentence language identification learning module called SLIL, which consists of a FiLM layer and a Squeeze-and-Excitation Networks layer. Extensive experiments on the ATCSpeech dataset show that our proposed method outperforms the baseline model. Compared to the vanilla FiLMed backbone model, the proposed multilingual ASR model obtains about 7.50% character error rate relative performance improvement. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2303.11420 [pdf, other]

ADCNet: Learning from Raw Radar Data via Distillation

Authors: Bo Yang, Ishan Khatri, Michael Happold, Chulong Chen

Abstract: As autonomous vehicles and advanced driving assistance systems have entered wider deployment, there is an increased interest in building robust perception systems using radars. Radar-based systems are lower cost and more robust to adverse weather conditions than their LiDAR-based counterparts; however the point clouds produced are typically noisy and sparse by comparison. In order to combat these… ▽ More As autonomous vehicles and advanced driving assistance systems have entered wider deployment, there is an increased interest in building robust perception systems using radars. Radar-based systems are lower cost and more robust to adverse weather conditions than their LiDAR-based counterparts; however the point clouds produced are typically noisy and sparse by comparison. In order to combat these challenges, recent research has focused on consuming the raw radar data, instead of the final radar point cloud. We build on this line of work and demonstrate that by bringing elements of the signal processing pipeline into our network and then pre-training on the signal processing task, we are able to achieve state of the art detection performance on the RADIal dataset. Our method uses expensive offline signal processing algorithms to pseudo-label data and trains a network to distill this information into a fast convolutional backbone, which can then be finetuned for perception tasks. Extensive experiment results corroborate the effectiveness of the proposed techniques. △ Less

Submitted 13 December, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Update 12/13/2023: upgrade organization and presentation of the paper, adding appendix

arXiv:2301.00656 [pdf, other]

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

Authors: Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

Abstract: Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen te… ▽ More Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse. We propose TriNet, which introduces a novel triple-branch architecture for preventing collapse and stabilizing the pre-training. TriNet learns the SSL latent embedding space and incorporates it to a higher level space for predicting pseudo target vectors generated by a frozen teacher. Our experimental results show that the proposed method notably stabilizes and accelerates pre-training and achieves a relative word error rate reduction (WERR) of 6.06% compared to the state-of-the-art (SOTA) Data2vec for a downstream benchmark ASR task. We will release our code at https://github.com/tencent-ailab/. △ Less

Submitted 14 March, 2023; v1 submitted 12 December, 2022; originally announced January 2023.

Comments: Accepted by ICASSP 2023

arXiv:2212.14183 [pdf, other]

doi 10.1109/TSC.2022.3230699

How to Share: Balancing Layer and Chain Sharing in Industrial Microservice Deployment

Authors: Yuxiang Liu, Bo Yang, Yu Wu, Cailian Chen, ** Guan

Abstract: With the rapid development of smart manufacturing, edge computing-oriented microservice platforms are emerging as an important part of production control. In the containerized deployment of microservices, layer sharing can reduce the huge bandwidth consumption caused by image pulling, and chain sharing can reduce communication overhead caused by communication between microservices. The two sharing… ▽ More With the rapid development of smart manufacturing, edge computing-oriented microservice platforms are emerging as an important part of production control. In the containerized deployment of microservices, layer sharing can reduce the huge bandwidth consumption caused by image pulling, and chain sharing can reduce communication overhead caused by communication between microservices. The two sharing methods use the characteristics of each microservice to share resources during deployment. However, due to the limited resources of edge servers, it is difficult to meet the optimization goals of the two methods at the same time. Therefore, it is of critical importance to realize the improvement of service response efficiency by balancing the two sharing methods. This paper studies the optimal microservice deployment strategy that can balance layer sharing and chain sharing of microservices. We build a problem that minimizes microservice image pull delay and communication overhead and transform the problem into a linearly constrained integer quadratic programming problem through model reconstruction. A deployment strategy is obtained through the successive convex approximation (SCA) method. Experimental results show that the proposed deployment strategy can balance the two resource sharing methods. When the two sharing methods are equally considered, the average image pull delay can be reduced to 65% of the baseline, and the average communication overhead can be reduced to 30% of the baseline. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.01770 [pdf, other]

doi 10.1016/j.ijepes.2022.108851

Distributionally Robust Day-ahead Scheduling for Power-traffic Network under a Potential Game Framework

Authors: Haoran Deng, Bo Yang, Chao Ning, Cailian Chen, ** Guan

Abstract: Widespread utilization of electric vehicles (EVs) incurs more uncertainties and impacts on the scheduling of the power-transportation coupled network. This paper investigates optimal power scheduling for a power-transportation coupled network in the day-ahead energy market considering multiple uncertainties related to photovoltaic (PV) generation and the traffic demand of vehicles. The crux of thi… ▽ More Widespread utilization of electric vehicles (EVs) incurs more uncertainties and impacts on the scheduling of the power-transportation coupled network. This paper investigates optimal power scheduling for a power-transportation coupled network in the day-ahead energy market considering multiple uncertainties related to photovoltaic (PV) generation and the traffic demand of vehicles. The crux of this problem is to model the coupling relation between the two networks in the day-ahead scheduling stage and consider the intra-day spatial uncertainties of the source and load. Meanwhile, the flexible load with a certain adjustment margin is introduced to ensure the balance of supply and demand of power nodes and consume the renewable energy better. Furthermore, we show the interactions between the power system and EV users from a potential game-theoretic perspective, where the uncertainties are characterized by an ambiguity set. In order to ensure the individual optimality of the two networks in a unified framework in day-ahead power scheduling, a two-stage distributionally robust centralized optimization model is established to carry out the equilibrium of power-transportation coupled network. On this basis, a combination of the duality theory and the Benders decomposition is developed to solve the distributionally robust optimization (DRO) model. Simulations demonstrate that the proposed approach can obtain individual optimal and less conservative strategies. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2110.14209

Journal ref: International Journal of Electrical Power and Energy Systems 2023

arXiv:2211.15752 [pdf, other]

Hierarchical Control Strategy for Moving A Robot Manipulator Between Small Containers

Authors: Paolo Torrado, Boling Yang, Joshua Smith

Abstract: In this paper, we study the implementation of a model predictive controller (MPC) for the task of object manipulation in a highly uncertain environment (e.g., picking objects from a semi-flexible array of densely packed bins). As a real-time perception-driven feedback controller, MPC is robust to the uncertainties in this environment. However, our experiment shows MPC cannot control a robot to com… ▽ More In this paper, we study the implementation of a model predictive controller (MPC) for the task of object manipulation in a highly uncertain environment (e.g., picking objects from a semi-flexible array of densely packed bins). As a real-time perception-driven feedback controller, MPC is robust to the uncertainties in this environment. However, our experiment shows MPC cannot control a robot to complete a sequence of motions in a heavily occluded environment due to its myopic nature. It will benefit from adding a high-level policy that adaptively adjusts the optimization problem for MPC. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.11112 [pdf, other]

doi 10.1109/TASLP.2024.3393718

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Authors: Sherif Abdulatif, Ruizhe Cao, Bin Yang

Abstract: In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen… ▽ More In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution. For example, in the denoising task using the Voice Bank+DEMAND dataset, CMGAN notably exceeded the performance of prior models, attaining a PESQ score of 3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations are available online. △ Less

Submitted 3 May, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 17 pages, 11 figures, and 6 tables. arXiv admin note: text overlap with arXiv:2203.15149

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2477-2493, 2024

arXiv:2208.08894 [pdf]

EEG Machine Learning for Analysis of Mild Traumatic Brain Injury: A survey

Authors: Weiqing Gu, Ryan Chang, Bohan Yang

Abstract: Mild Traumatic Brain Injury (mTBI) is a common brain injury and affects a diverse group of people: soldiers, constructors, athletes, drivers, children, elders, and nearly everyone. Thus, having a well-established, fast, cheap, and accurate classification method is crucial for the well-being of people around the globe. Luckily, using Machine Learning (ML) on electroencephalography (EEG) data shows… ▽ More Mild Traumatic Brain Injury (mTBI) is a common brain injury and affects a diverse group of people: soldiers, constructors, athletes, drivers, children, elders, and nearly everyone. Thus, having a well-established, fast, cheap, and accurate classification method is crucial for the well-being of people around the globe. Luckily, using Machine Learning (ML) on electroencephalography (EEG) data shows promising results. This survey analyzed the most cutting-edge articles from 2017 to the present. The articles were searched from the Google Scholar database and went through an elimination process based on our criteria. We reviewed, summarized, and compared the fourteen most cutting-edge machine learning research papers for predicting and classifying mTBI in terms of 1) EEG data types, 2) data preprocessing methods, 3) machine learning feature representations, 4) feature extraction methods, and 5) machine learning classifiers and predictions. The most common EEG data type was human resting-state EEG, with most studies using filters to clean the data. The power spectral, especially alpha and theta power, was the most prevalent feature. The other non-power spectral features, such as entropy, also show their great potential. The Fourier transform is the most common feature extraction method while using neural networks as automatic feature extraction generally returns a high accuracy result. Lastly, Support Vector Machine (SVM) was our survey's most common ML classifier due to its lower computational complexity and solid mathematical theoretical basis. The purpose of this study was to collect and explore a sparsely populated sector of ML, and we hope that our survey has shined some light on the inherent trends, advantages, disadvantages, and preferences of the current state of machine learning-based EEG analysis for mTBI. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 27 pages

arXiv:2208.04509 [pdf, other]

Reconfigurable Intelligent Computational Surfaces: When Wave Propagation Control Meets Computing

Authors: Bo Yang, Xuelin Cao, **dan Xu, Chongwen Huang, George C. Alexandropoulos, Linglong Dai, M'erouane Debbah, H. Vincent Poor, Chau Yuen

Abstract: The envisioned sixth-generation (6G) of wireless networks will involve an intelligent integration of communications and computing, thereby meeting the urgent demands of diverse applications. To realize the concept of the smart radio environment, reconfigurable intelligent surfaces (RISs) are a promising technology for offering programmable propagation of im**ing electromagnetic signals via exter… ▽ More The envisioned sixth-generation (6G) of wireless networks will involve an intelligent integration of communications and computing, thereby meeting the urgent demands of diverse applications. To realize the concept of the smart radio environment, reconfigurable intelligent surfaces (RISs) are a promising technology for offering programmable propagation of im**ing electromagnetic signals via external control. However, the purely reflective nature of conventional RISs induces significant challenges in supporting computation-based applications, e.g., wave-based calculation and signal processing. To fulfil future communication and computing requirements, new materials are needed to complement the existing technologies of metasurfaces, enabling further diversification of electronics and their applications. In this event, we introduce the concept of reconfigurable intelligent computational surface (RICS), which is composed of two reconfigurable multifunctional layers: the `reconfigurable beamforming layer' which is responsible for tunable signal reflection, absorption, and refraction, and the `intelligence computation layer' that concentrates on metamaterials-based computing. By exploring the recent trends on computational metamaterials, RICSs have the potential to make joint communication and computation a reality. We further demonstrate two typical applications of RICSs for performing wireless spectrum sensing and secrecy signal processing. Future research challenges arising from the design and operation of RICSs are finally highlighted. △ Less

Submitted 3 October, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2207.00615 [pdf, other]

Synthesis of General Decoupling Networks Using Transmission Lines

Authors: Binbin Yang

Abstract: In this paper, we introduce a synthesis technique for transmission line based decoupling networks, which find application in coupled systems such as multiple-antenna systems and antenna arrays. Employing the generalized $π$-network and the transmission line analysis technique, we reduce the decoupling network design into simple matrix calculations. The synthesized decoupling network is essentially… ▽ More In this paper, we introduce a synthesis technique for transmission line based decoupling networks, which find application in coupled systems such as multiple-antenna systems and antenna arrays. Employing the generalized $π$-network and the transmission line analysis technique, we reduce the decoupling network design into simple matrix calculations. The synthesized decoupling network is essentially a generalized $π$-network with transmission lines at all branches. The advantage of this proposed decoupling network is that it can be implemented using transmission lines, ensuring better control on loss, performance consistency and higher power handling capability, when compared with lumped components, and can be easily scaled for operation at different frequencies. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: 4 pages

arXiv:2206.00208 [pdf, other]

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Authors: Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su

Abstract: Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pre-trained TTS model to adapt to new target speakers with limited data. While much effort has been conducted towards this task, seldom work has been performed for low computational resource scenarios due to the challenges raised by the requirement of the lightweight model and less computational complexity. In this paper, a tiny… ▽ More Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pre-trained TTS model to adapt to new target speakers with limited data. While much effort has been conducted towards this task, seldom work has been performed for low computational resource scenarios due to the challenges raised by the requirement of the lightweight model and less computational complexity. In this paper, a tiny VITS-based TTS model, named AdaVITS, for low computing resource speaker adaptation is proposed. To effectively reduce parameters and computational complexity of VITS, an iSTFT-based wave construction decoder is proposed to replace the upsampling-based decoder which is resource-consuming in the original VITS. Besides, NanoFlow is introduced to share the density estimate across flow blocks to reduce the parameters of the prior encoder. Furthermore, to reduce the computational complexity of the textual encoder, scaled-dot attention is replaced with linear attention. To deal with the instability caused by the simplified model, instead of using the original text encoder, phonetic posteriorgram (PPG) is utilized as linguistic feature via a text-to-PPG module, which is then used as input for the encoder. Experiment shows that AdaVITS can generate stable and natural speech in speaker adaptation with 8.97M model parameters and 0.72GFlops computational complexity. △ Less

Submitted 2 November, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

Comments: Accepted by ISCSLP 2022

arXiv:2204.14059 [pdf]

Improving the estimation of directional area scattering factor (DASF) from canopy reflectance: theoretical basis and validation

Authors: Yi Lin, Siyuan Liu, Lei Yan, Kai Yan, Yelu Zeng, Bin Yang

Abstract: Directional area scattering factor (DASF) is a critical canopy structural parameter for vegetation monitoring. It provides an efficient tool for decoupling of canopy structure and leaf optics from canopy reflectance. Current standard approach to estimate DASF from canopy bidirectional reflectance factor (BRF) is based on the assumption that in the weakly absorbing 710 to 790 nm spectral interval,… ▽ More Directional area scattering factor (DASF) is a critical canopy structural parameter for vegetation monitoring. It provides an efficient tool for decoupling of canopy structure and leaf optics from canopy reflectance. Current standard approach to estimate DASF from canopy bidirectional reflectance factor (BRF) is based on the assumption that in the weakly absorbing 710 to 790 nm spectral interval, leaf scattering does not change much with the concentration of dry matter and thus its variation can be neglected. This results in biased estimates of DASF and consequently leads to uncertainty in DASF-related applications. This study proposes a new approach to account for variations in concentrations of this biochemical constituent, which additionally uses the canopy BRF at 2260 nm. In silico analysis of the proposed approach suggests significant increase in accuracy over the standard technique by a relative root mean square error (rRMSE) of 49% and 34% for one- and three dimensional scenes, respectively. When compared with indoor multi-angular hyperspectral measurements reported in literature, the mean absolute error has reduced by 68% for needle leaf and 20% for broadleaf canopies. Thus, the proposed DASF estimation approach outperforms the current one and can be used more reliably in DASF-related applications, such as vegetation monitoring of functional traits, dynamics, and radiation budget. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2204.04088 [pdf, other]

Stochastic Gradient-based Fast Distributed Multi-Energy Management for an Industrial Park with Temporally-Coupled Constraints

Authors: Dafeng Zhu, Bo Yang, Chengbin Ma, Zhaojian Wang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline… ▽ More Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline without knowing statistical information of random variables, this paper presents a systematic online energy cost minimization framework to fulfill the complementary utilization of multi-energy with time-varying generation, demand and price. Specifically to achieve charging/discharging constraints due to storage and short-term energy balancing, a fast distributed algorithm based on stochastic gradient with two-timescale implementation is proposed to ensure online implementation. To reduce the peak loads, an incentive mechanism is implemented by estimating users' willingness to shift. Analytical results on parameter setting are also given to guarantee feasibility and optimality of the proposed design. Numerical results show that when the bid-ask spread of electricity is small enough, the proposed algorithm can achieve the close-to-optimal cost asymptotically. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted by Applied Energy

arXiv:2204.01645 [pdf, other]

Three-dimensional Microstructural Image Synthesis from 2D Backscattered Electron Image of Cement Paste

Authors: Xin Zhao, Xu Wu, Lin Wang, Pengkun Hou, Qinfei Li, Yuxuan Zhang, Bo Yang

Abstract: The microstructure is significant for exploring the physical properties of hardened cement paste. In general, the microstructures of hardened cement paste are obtained by microscopy. As a popular method, scanning electron microscopy (SEM) can acquire high-quality 2D images but fails to obtain 3D microstructures.Although several methods, such as microtomography (Micro-CT) and Focused Ion Beam Scann… ▽ More The microstructure is significant for exploring the physical properties of hardened cement paste. In general, the microstructures of hardened cement paste are obtained by microscopy. As a popular method, scanning electron microscopy (SEM) can acquire high-quality 2D images but fails to obtain 3D microstructures.Although several methods, such as microtomography (Micro-CT) and Focused Ion Beam Scanning Electron Microscopy (FIB-SEM), can acquire 3D microstructures, these fail to obtain high-quality 3D images or consume considerable cost. To address these issues, a method based on solid texture synthesis is proposed, synthesizing high-quality 3D microstructural image of hardened cement paste. This method includes 2D backscattered electron (BSE) image acquisition and 3D microstructure synthesis phases. In the approach, the synthesis model is based on solid texture synthesis, capturing microstructure information of the acquired 2D BSE image and generating high-quality 3D microstructures. In experiments, the method is verified on actual 3D Micro-CT images and 2D BSE images. Finally, qualitative experiments demonstrate that the 3D microstructures generated by our method have similar visual characteristics to the given 2D example. Furthermore, quantitative experiments prove that the synthetic 3D results are consistent with the actual instance in terms of porosity, particle size distribution, and grey scale co-occurrence matrix. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 25 pages, 9 figures

arXiv:2203.15149 [pdf, other]

doi 10.21437/Interspeech.2022-517

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Authors: Ruizhe Cao, Sherif Abdulatif, Bin Yang

Abstract: Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we ut… ▽ More Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score. Quantitative analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in outperforming various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB. △ Less

Submitted 3 March, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: 5 pages, 1 figure, 2 tables, published in INTERSPEECH 2022

Journal ref: Proceedings of INTERSPEECH, 2022, pp. 936--940

arXiv:2203.00270 [pdf, other]

Bidirectional Pricing and Demand Response for Nanogrids with HVAC Systems

Authors: Jiaxin Cao, Bo Yang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and en… ▽ More Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and environmental factors. For this purpose, the demand response and two-way pricing problems concurrently for nanogrids and a public monitoring entity (PME) are studied with exploiting the large potential thermal elastic ability of heating, ventilation and air-conditioning (HVAC) units. Different from nanogrids, in terms of minimizing time-average costs, PME aims to set reasonable prices and optimize profits by trading with nanogrids and the main grid bi-directionally. In particular, such bilevel energy management problem is formulated as a stochastic form in a long-term horizon. Since there are uncertain system parameters, time-coupled queue constraints and the interplay of bilevel decision-making, it is challenging to solve the formulated problems. To this end, we derive a form of relaxation based on Lyapunov optimization technique to make the energy management problem tractable without forecasting the related system parameters. The transaction between nanogrids and PME is captured by a one-leader and multi-follower Stackelberg game framework. Then, theoretical analysis of the existence and uniqueness of Stackelberg equilibrium (SE) is developed based on the proposed game property. Following that, we devise an optimization algorithm to reach the SE with less information exchange. Numerical experiments validate the effectiveness of the proposed approach. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Showing 1–50 of 121 results for author: Yang, B