Search | arXiv e-print repository

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this study, we delve into two speaker augmentation approaches: speed perturbation (SP) and vocal tract length perturbation (VTLP). Despite the empirical utilization of both methods, a comprehensive investigation into their efficacy is lacking. Our study, conducted using two public datasets, VoxCeleb and CN-Celeb, revealed that both SP and VTLP are proficient at generating new speakers, leading to significant performance improvements in speaker recognition. Furthermore, they exhibit distinct properties in sensitivity to perturbation factors and data complexity, hinting at the potential benefits of their fusion. Our research underscores the substantial potential of speaker augmentation, highlighting the importance of in-depth exploration and analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: to be published in INTERSPEECH 2024

arXiv:2406.06543 [pdf, other]

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

Authors: Zhanglu Yan, Zhenyu Bai, Tulika Mitra, Weng-Fai Wong

Abstract: Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are well-known for their energy efficiency, making them ideal for wearable devices and energy-constrained edge computing platforms. However, current energy… ▽ More Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are well-known for their energy efficiency, making them ideal for wearable devices and energy-constrained edge computing platforms. However, current energy measurement of SNN implementations for detecting heart diseases typically rely on empirical values, often overlooking hardware overhead. Additionally, the integer and fire activations in SNNs require multiple memory accesses and repeated computations, which can further compromise energy efficiency. In this paper, we propose sparrowSNN, a redesign of the standard SNN workflow from a hardware perspective, and present a dedicated ASIC design for SNNs, optimized for ultra-low power wearable devices used in heartbeat classification. Using the MIT-BIH dataset, our SNN achieves a state-of-the-art accuracy of 98.29% for SNNs, with energy consumption of 31.39nJ per inference and power usage of 6.1uW, making sparrowSNN the highest accuracy with the lowest energy use among comparable systems. We also compare the energy-to-accuracy trade-offs between SNNs and quantized ANNs, offering recommendations on insights on how best to use SNNs. △ Less

Submitted 6 May, 2024; originally announced June 2024.

arXiv:2406.00485 [pdf]

TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Authors: Zhenyu Lu, Jialong Yang, Haoran Li, Yifan Li, Weiyong Si, Nathan Lepora, Chenguang Yang

Abstract: In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth de… ▽ More In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth deformation and being sensitive to edge and surface following, improves the structure in that the markers are distributed within the gap of papillae pins. Variations in light, dark, and grey effects can be generated inside the sensor through external contact interactions. The contours of the contacting objects are outlined by white markers, while the contact depth characteristics can be indirectly obtained from the distribution of black pins and white markers, creating a 2.5D visualization. Based on the imaging effect, we improve the Shape from Shading (SFS) algorithm to process tactile images, enabling a coarse but fast reconstruction for the contact objects. Two experiments are performed. The first verifies TacShade s ability to reconstruct the shape of the contact objects through one image for object distinction. The second experiment shows the shape reconstruction capability of TacShade for a large panel with ridged patterns based on the location of robots and image splicing technology. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by ICRA 2024

arXiv:2404.15643 [pdf, ps, other]

Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge, we propose in this paper utilizing movable antenna (MA) arrays to enhance the satellite beam coverage and interference mitigation. Specifically, given the satellite orbit and the coverage requirement within a specific time interval, the antenna position vector (APV) and antenna weight vector (AWV) of the satellite-mounted MA array are jointly optimized over time to minimize the average signal leakage power to the interference area of the satellite, subject to the constraints of the minimum beamforming gain over the coverage area, the continuous movement of MAs, and the constant modulus of AWV. The corresponding continuous-time decision process for the APV and AWV is first transformed into a more tractable discrete-time optimization problem. Then, an alternating optimization (AO)-based algorithm is developed by iteratively optimizing the APV and AWV, where the successive convex approximation (SCA) technique is utilized to obtain locally optimal solutions during the iterations. Moreover, to further reduce the antenna movement overhead, a low-complexity MA scheme is proposed by using an optimized common APV over all time slots. Simulation results validate that the proposed MA array-aided beam coverage schemes can significantly decrease the interference leakage of the satellite compared to conventional FPA-based schemes, while the low-complexity MA scheme can achieve a performance comparable to the continuous-movement scheme. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13786 [pdf, other]

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, **gfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.06674 [pdf, other]

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yu** Wang, Yuxuan Wang, Mingbo Ma

Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion effect is weak, there is no zero-shot capability for out-of-distribution speakers, or the synthesized outputs exhibit undesirable timbre leakage. Our work proposes solutions for each of these issues in a simple modular framework based on a conditional diffusion backbone model with optional normalizing flow-based and sequence-to-sequence speaker attribute-editing modules, whose components can be combined or removed during inference to meet a wide array of tasks without additional model finetuning. Audio samples are available at \url{https://voiceshopai.github.io}. △ Less

Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06265 [pdf, other]

Spatial-Temporal Multi-level Association for Video Object Segmentation

Authors: Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

Abstract: Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal m… ▽ More Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal multi-level association framework, which jointly associates reference frame, test frame, and object features to achieve sufficient interaction and parallel target ID association with a spatial-temporal memory bank for efficient video object segmentation. Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features, which formulates feature extraction and interaction as the efficient operations of object self-attention, reference object enhancement, and test reference correlation. In addition, we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation. We evaluate the proposed method by conducting extensive experiments on numerous video object segmentation datasets, including DAVIS 2016/2017 val, DAVIS 2017 test-dev, and YouTube-VOS 2018/2019 val. The favorable performance against the state-of-the-art methods demonstrates the effectiveness of our approach. All source code and trained models will be made publicly available. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01717 [pdf, other]

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

Authors: Rui Xie, Ying Tai, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Xiaoqian Ye, Qian Wang, Jian Yang

Abstract: Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion di… ▽ More Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion distillation (ADD), we design~\name~to address this issue by incorporating the ideas of both distillation and ControlNet. Specifically, we first propose a prediction-based self-refinement strategy to provide high-frequency information in the student model output with marginal additional time cost. Furthermore, we refine the training process by employing HR images, rather than LR images, to regulate the teacher model, providing a more robust constraint for distillation. Second, we introduce a timestep-adaptive ADD to address the perception-distortion imbalance problem introduced by original ADD. Extensive experiments demonstrate our~\name~generates better restoration results, while achieving faster speed than previous SD-based state-of-the-art models (e.g., $7$$\times$ faster than SeeSR). △ Less

Submitted 23 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.10723 [pdf, other]

Leveraging Symmetries in Gaits for Reinforcement Learning: A Case Study on Quadrupedal Gaits

Authors: Jiayu Ding, Xulin Chen, Garret E. Katz, Zhenyu Gan

Abstract: In this research, we address the complex task of develo** versatile and agile quadrupedal gaits for robotic platforms, a domain predominantly governed by model-based trajectory optimization methods. We propose an innovative, reference-free reinforcement learning framework that exploits the intrinsic symmetries of dynamic systems to synthesize a broad array of naturalistic quadrupedal locomotion… ▽ More In this research, we address the complex task of develo** versatile and agile quadrupedal gaits for robotic platforms, a domain predominantly governed by model-based trajectory optimization methods. We propose an innovative, reference-free reinforcement learning framework that exploits the intrinsic symmetries of dynamic systems to synthesize a broad array of naturalistic quadrupedal locomotion patterns. By capitalizing on distinct symmetry characteristics - namely temporal, morphological, and time-reversal - our approach efficiently facilitates the generation and transition among diverse gaits such as pronking, bounding half-bounding and gallo**, across a spectrum of velocities, circumventing the necessity for expert-generated trajectories or complex reward structures. Implemented on the Petoi Bittle robotic model, our methodology illustrates robust and adaptable gait generation capabilities, significantly broadening the scope for robotic mobility and speed adaptability. This contribution not only advances our comprehension of quadrupedal locomotion mechanisms but also underscores the pivotal role of symmetry in the development of scalable and effective robotic gait strategies. Our findings hold substantial implications for robotic design and control, potentially enhancing operational versatility and efficiency across a variety of deployment environments. △ Less

Submitted 14 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.15185 [pdf, other]

Pre-Chirp-Domain Index Modulation for Affine Frequency Division Multiplexing

Authors: Guangyao Liu, Tianqi Mao, Ruiqi Liu, Zhenyu Xiao

Abstract: Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter a… ▽ More Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter and post-chirp parameter, respectively. These parameters can be fine-tuned to avoid overlap** channel paths with different delays or Doppler shifts, leading to performance enhancement especially for doubly dispersive channel. In this paper, we propose a novel AFDM structure with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can embed additional information bits into the pre-chirp parameter design for both spectral and energy efficiency enhancement. Specifically, we first demonstrate that the application of distinct pre-chirp parameters to various subcarriers in the AFDM modulation process maintains the orthogonality among these subcarriers. Then, different pre-chirp parameters are flexibly assigned to each AFDM subcarrier according to the incoming bits. By such arrangement, aside from classical phase/amplitude modulation, extra binary bits can be implicitly conveyed by the indices of selected pre-chir** parameters realizations without additional energy consumption. At the receiver, both a maximum likelihood (ML) detector and a reduced-complexity ML-minimum mean square error (ML-MMSE) detector are employed to recover the information bits. It has been shown via simulations that the proposed AFDM-PIM exhibits superior bit error rate (BER) performance compared to classical AFDM, OFDM and IM-aided OFDM algorithms. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.02699 [pdf, other]

Adversarial Data Augmentation for Robust Speaker Verification

Authors: Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

Abstract: Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a pot… ▽ More Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.00771 [pdf, other]

Mixed Static and Reconfigurable Metasurface Deployment in Indoor Dense Spaces: How Much Reconfigurability is Needed?

Authors: Zhenyu Li, Ozan Alp Topal, Özlem Tuğfe Demir, Emil Björnson, Cicek Cavdar

Abstract: In this paper, we investigate how metasurfaces can be deployed to deliver high data rates in a millimeter-wave (mmWave) indoor dense space with many blocking objects. These surfaces can either be static metasurfaces (SMSs) that reflect with fixed phase-shifts or reconfigurable intelligent surfaces (RISs) that can reconfigure their phase-shifts to the currently served user. The latter comes with an… ▽ More In this paper, we investigate how metasurfaces can be deployed to deliver high data rates in a millimeter-wave (mmWave) indoor dense space with many blocking objects. These surfaces can either be static metasurfaces (SMSs) that reflect with fixed phase-shifts or reconfigurable intelligent surfaces (RISs) that can reconfigure their phase-shifts to the currently served user. The latter comes with an increased power, cabling, and signaling cost. To see how reconfigurability affects the network performance, we propose an iterative algorithm based on the feasible point pursuit successive convex approximation method. We jointly optimize the types and phase-shifts of the surfaces and the time portion allocated to each user equipment to maximize the minimum data rate achieved by the network. Our numerical results demonstrate that the minimum data rate improves as more RISs are introduced but the gain diminishes after some point. Therefore, introducing more reconfigurability is not always necessary. Another result shows that to reach the same data rate achieved by using 22 SMSs, at least 18 RISs are needed. This suggests that when it is costly to deploy many RISs, as an inexpensive alternative solution, one can reach the same data rate just by densely deploying more SMSs. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures, Accepted to be presented in IEEE WCNC 2024

arXiv:2401.08974 [pdf, ps, other]

Performance Analysis and Optimization for Movable Antenna Aided Wideband Communications

Authors: Lipeng Zhu, Wenyan Ma, Zhenyu Xiao, Rui Zhang

Abstract: Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by enabling the local movement of antennas at the transmitter (Tx) and/or receiver (Rx) for achieving more favorable channel conditions. As the existing studies on MA-aided wireless communications have mainly considered narrow-band transmission in flat fading channels, we investigate in this pa… ▽ More Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by enabling the local movement of antennas at the transmitter (Tx) and/or receiver (Rx) for achieving more favorable channel conditions. As the existing studies on MA-aided wireless communications have mainly considered narrow-band transmission in flat fading channels, we investigate in this paper the MA-aided wideband communications employing orthogonal frequency division multiplexing (OFDM) in frequency-selective fading channels. Under the general multi-tap field-response channel model, the wireless channel variations in both space and frequency are characterized with different positions of the MAs. Unlike the narrow-band transmission where the optimal MA position at the Tx/Rx simply maximizes the single-tap channel amplitude, the MA position in the wideband case needs to balance the amplitudes and phases over multiple channel taps in order to maximize the OFDM transmission rate over multiple frequency subcarriers. First, we derive an upper bound on the OFDM achievable rate in closed form when the size of the Tx/Rx region for antenna movement is arbitrarily large. Next, we develop a parallel greedy ascent (PGA) algorithm to obtain locally optimal solutions to the MAs' positions for OFDM rate maximization subject to finite-size Tx/Rx regions. To reduce computational complexity, a simplified PGA algorithm is also provided to optimize the MAs' positions more efficiently. Simulation results demonstrate that the proposed PGA algorithms can approach the OFDM rate upper bound closely with the increase of Tx/Rx region sizes and outperform conventional systems with fixed-position antennas (FPAs) under the wideband channel setup. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.00806 [pdf, other]

Noise-Aware and Equitable Urban Air Traffic Management: An Optimization Approach

Authors: Zhenyu Gao, Yue Yu, Qinshuang Wei, Ufuk Topcu, John-Paul Clarke

Abstract: Urban air mobility (UAM), a transformative concept for the transport of passengers and cargo, faces several integration challenges in complex urban environments. Community acceptance of aircraft noise is among the most noticeable of these challenges when launching or scaling up a UAM system. Properly managing community noise is fundamental to establishing a UAM system that is environmentally and s… ▽ More Urban air mobility (UAM), a transformative concept for the transport of passengers and cargo, faces several integration challenges in complex urban environments. Community acceptance of aircraft noise is among the most noticeable of these challenges when launching or scaling up a UAM system. Properly managing community noise is fundamental to establishing a UAM system that is environmentally and socially sustainable. In this work, we develop a holistic and equitable approach to manage UAM air traffic and its community noise impact in urban environments. The proposed approach is a hybrid approach that considers a mix of different noise mitigation strategies, including limiting the number of operations, cruising at higher altitudes, and ambient noise masking. We tackle the problem through the lens of network system control and formulate a multi-objective optimization model for managing traffic flow in a multi-layer UAM network while concurrently pursuing demand fulfillment, noise control, and energy saving. Further, we use a social welfare function in the optimization model as the basis for the efficiency-fairness trade-off in both demand fulfillment and noise control. We apply the proposed approach to a comprehensive case study in the city of Austin and perform design trade-offs through both visual and quantitative analyses. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 30 pages, 15 figures

arXiv:2312.13603 [pdf, other]

Style Modeling for Multi-Speaker Articulation-to-Speech

Authors: Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Abstract: In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation. Most conventional ATS approaches only focus on modeling contextual information of speech from a single speaker's articulatory features. To explicitly represent each speaker's speaking style as well as the contextual information, our p… ▽ More In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation. Most conventional ATS approaches only focus on modeling contextual information of speech from a single speaker's articulatory features. To explicitly represent each speaker's speaking style as well as the contextual information, our proposed model estimates style embeddings, guided from the essential speech style attributes such as pitch and energy. We adopt convolutional layers and transformer-based attention layers for our model to fully utilize both local and global information of articulatory signals, measured by electromagnetic articulography (EMA). Our model significantly improves the quality of synthesized speech compared to the baseline in terms of objective and subjective measurements in the Haskins dataset. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 5 pages, Accepted to ICASSP 2023

arXiv:2312.13600 [pdf, other]

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

Authors: Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Abstract: Decoding spoken speech from neural activity in the brain is a fast-emerging research topic, as it could enable communication for people who have difficulties with producing audible speech. For this task, electrocorticography (ECoG) is a common method for recording brain activity with high temporal resolution and high spatial precision. However, due to the risky surgical procedure required for obta… ▽ More Decoding spoken speech from neural activity in the brain is a fast-emerging research topic, as it could enable communication for people who have difficulties with producing audible speech. For this task, electrocorticography (ECoG) is a common method for recording brain activity with high temporal resolution and high spatial precision. However, due to the risky surgical procedure required for obtaining ECoG recordings, relatively little of this data has been collected, and the amount is insufficient to train a neural network-based Brain-to-Speech (BTS) system. To address this problem, we propose BrainTalker-a novel BTS framework that generates intelligible spoken speech from ECoG signals under extremely low-resource scenarios. We apply a transfer learning approach utilizing a pre-trained self supervised model, Wav2Vec 2.0. Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2.0 representations of the corresponding spoken speech. These embeddings are then transformed into mel-spectrograms using stacked convolutional and transformer-based layers, which are fed into a neural vocoder to synthesize speech waveform. Experimental results demonstrate our proposed framework achieves outstanding performance in terms of subjective and objective metrics, including a Pearson correlation coefficient of 0.9 between generated and ground truth mel spectrograms. We share publicly available Demos and Code. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 5 pages. Accepted to BHI 2023

arXiv:2312.06969 [pdf, ps, other]

Channel Estimation for Movable Antenna Communication Systems: A Framework Based on Compressed Sensing

Authors: Zhenyu Xiao, Songqi Cao, Lipeng Zhu, Yanming Liu, Xiang-Gen Xia, Rui Zhang

Abstract: Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose… ▽ More Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose a general channel estimation framework for MA systems by exploiting the multi-path field response channel structure. Specifically, the angles of departure (AoDs), angles of arrival (AoAs), and complex coefficients of the multi-path components (MPCs) are jointly estimated by employing the compressed sensing method, based on multiple channel measurements at designated positions of the Tx-MA and Rx-MA. Under this framework, the Tx-MA and Rx-MA measurement positions fundamentally determine the measurement matrix for compressed sensing, of which the mutual coherence is analyzed from the perspective of Fourier transform. Moreover, two criteria for MA measurement positions are provided to guarantee the successful recovery of MPCs. Then, we propose several MA measurement position setups and compare their performance. Finally, comprehensive simulation results show that the proposed framework is able to estimate the complete CSI between the Tx and Rx regions with a high accuracy. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.06454 [pdf, other]

Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images

Authors: Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian

Abstract: Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a… ▽ More Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation and data privacy concerns. However, federated learning encounters challenges in addressing label imbalance in multi-site WSIs from the real world. Moreover, existing WSI classification methods cannot simultaneously exploit local context information and long-range dependencies in the site-end feature representation of federated learning. To address these issues, we present a point transformer with federated learning for multi-site HER2 status prediction from HE-stained WSIs. Our approach incorporates two novel designs. We propose a dynamic label distribution strategy and an auxiliary classifier, which helps to establish a well-initialized model and mitigate label distribution variations across sites. Additionally, we propose a farthest cosine sampling based on cosine distance. It can sample the most distinctive features and capture the long-range dependencies. Extensive experiments and analysis show that our method achieves state-of-the-art performance at four sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can generalize to two unseen sites with 229 WSIs. △ Less

Submitted 27 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.07169 [pdf, other]

CASTER: A Computer-Vision-Assisted Wireless Channel Simulator for Gesture Recognition

Authors: Zhenyu Ren, Guoliang Li, Chenqing Ji, Chao Yu, Shuai Wang, Rui Wang

Abstract: In this paper, a computer-vision-assisted simulation method is proposed to address the issue of training dataset acquisition for wireless hand gesture recognition. In the existing literature, in order to classify gestures via the wireless channel estimation, massive training samples should be measured in a consistent environment, consuming significant efforts. In the proposed CASTER simulator, how… ▽ More In this paper, a computer-vision-assisted simulation method is proposed to address the issue of training dataset acquisition for wireless hand gesture recognition. In the existing literature, in order to classify gestures via the wireless channel estimation, massive training samples should be measured in a consistent environment, consuming significant efforts. In the proposed CASTER simulator, however, the training dataset can be simulated via existing videos. Particularly, a gesture is represented by a sequence of snapshots, and the channel impulse response of each snapshot is calculated via tracing the rays scattered off a primitive-based hand model. Moreover, CASTER simulator relies on the existing videos to extract the motion data of gestures. Thus, the massive measurements of wireless channel can be eliminated. The experiments demonstrate a 90.8% average classification accuracy of simulation-to-reality inference. △ Less

Submitted 17 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 10 pages, 11 figures

arXiv:2310.09299 [pdf, other]

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

Authors: Zhenyu Tao, Wei Xu, Xiaohu You

Abstract: The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiv… ▽ More The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process. △ Less

Submitted 21 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: 13 pages, 8 figures

arXiv:2310.03402 [pdf, other]

A Complementary Global and Local Knowledge Network for Ultrasound denoising with Fine-grained Refinement

Authors: Zhenyu Bu, Kai-Ni Wang, Fuxing Zhao, Shengxiao Li, Guang-Quan Zhou

Abstract: Ultrasound imaging serves as an effective and non-invasive diagnostic tool commonly employed in clinical examinations. However, the presence of speckle noise in ultrasound images invariably degrades image quality, impeding the performance of subsequent tasks, such as segmentation and classification. Existing methods for speckle noise reduction frequently induce excessive image smoothing or fail to… ▽ More Ultrasound imaging serves as an effective and non-invasive diagnostic tool commonly employed in clinical examinations. However, the presence of speckle noise in ultrasound images invariably degrades image quality, impeding the performance of subsequent tasks, such as segmentation and classification. Existing methods for speckle noise reduction frequently induce excessive image smoothing or fail to preserve detailed information adequately. In this paper, we propose a complementary global and local knowledge network for ultrasound denoising with fine-grained refinement. Initially, the proposed architecture employs the L-CSwinTransformer as encoder to capture global information, incorporating CNN as decoder to fuse local features. We expand the resolution of the feature at different stages to extract more global information compared to the original CSwinTransformer. Subsequently, we integrate Fine-grained Refinement Block (FRB) within the skip-connection stage to further augment features. We validate our model on two public datasets, HC18 and BUSI. Experimental results demonstrate that our model can achieve competitive performance in both quantitative metrics and visual performance. Our code will be available at https://github.com/AAlkaid/USDenoising. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Submitted to ICASSP 2024

arXiv:2309.14158 [pdf, other]

An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition

Authors: Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

Abstract: Multi-genre speaker recognition is becoming increasingly popular due to its ability to better represent the complexities of real-world applications. However, a major challenge is the significant shift in the distribution of speaker vectors across different genres. While distribution alignment is a common approach to address this challenge, previous studies have mainly focused on aligning a source… ▽ More Multi-genre speaker recognition is becoming increasingly popular due to its ability to better represent the complexities of real-world applications. However, a major challenge is the significant shift in the distribution of speaker vectors across different genres. While distribution alignment is a common approach to address this challenge, previous studies have mainly focused on aligning a source domain with a target domain, and the performance of multi-genre data is unknown. This paper presents a comprehensive study of mainstream distribution alignment methods on multi-genre data, where multiple distributions need to be aligned. We analyze various methods both qualitatively and quantitatively. Our experiments on the CN-Celeb dataset show that within-between distribution alignment (WBDA) performs relatively better. However, we also found that none of the investigated methods consistently improved performance in all test cases. This suggests that solely aligning the distributions of speaker vectors may not fully address the challenges posed by multi-genre speaker recognition. Further investigation is necessary to develop a more comprehensive solution. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: submitted to ICASSP 2024

arXiv:2308.09512 [pdf, other]

Multiuser Communications with Movable-Antenna Base Station: Joint Antenna Positioning, Receive Combining, and Power Control

Authors: Zhenyu Xiao, Xiangyu Pi, Lipeng Zhu, Xiang-Gen Xia, Rui Zhang

Abstract: Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize th… ▽ More Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation due to MAs' movements at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region for MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna position vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BSs can significantly improve the rate performance as compared to conventional BSs with fixed-position antennas (FPAs). △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2308.05546

arXiv:2308.05546 [pdf, other]

Multiuser Communications with Movable-Antenna Base Station Via Antenna Position Optimization

Authors: Xiangyu Pi, Lipeng Zhu, Zhenyu Xiao, Rui Zhang

Abstract: This paper studies the deployment of multiple movable antennas (MAs) at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation caused by MAs' movement at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided… ▽ More This paper studies the deployment of multiple movable antennas (MAs) at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation caused by MAs' movement at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region of MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna positioning vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BS can significantly improve the rate performance as compared to conventional BS with fixed-position antennas (FPAs). △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.00393 [pdf, other]

A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

Authors: Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei

Abstract: Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection… ▽ More Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues. This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.12286 [pdf, ps, other]

Double-Active-IRS Aided Wireless Communication: Deployment Optimization and Capacity Scaling

Authors: Zhenyu Kang, Changsheng You, Rui Zhang

Abstract: In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate… ▽ More In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate maximization problem subject to practical constraints on the reflection design, elements allocation, and placement of active IRSs. To solve this non-convex problem, we first obtain the optimal active-IRS reflections and BS beamforming, based on which we then jointly optimize the active-IRS elements allocation and placement by using the alternating optimization (AO) method. Moreover, we show that given the fixed per-element amplification power, the received signal-to-noise ratio (SNR) at the user increases asymptotically with the square of the number of reflecting elements; while given the fixed number of reflecting elements, the SNR does not increase with the per-element amplification power when it is asymptotically large. Last, numerical results are presented to validate the effectiveness of the proposed AO-based algorithm and compare the rate performance of the considered double-active-IRS aided wireless system with various benchmark systems. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.01445 [pdf, ps, other]

Distributed fusion filter over lossy wireless sensor networks with the presence of non-Gaussian noise

Authors: Jiacheng He, Bei Peng, Zhenyu Feng, Xuemei Mao, Song Gao, Gang Wang

Abstract: The information transmission between nodes in a wireless sensor networks (WSNs) often causes packet loss due to denial-of-service (DoS) attack, energy limitations, and environmental factors, and the information that is successfully transmitted can also be contaminated by non-Gaussian noise. The presence of these two factors poses a challenge for distributed state estimation (DSE) over WSNs. In thi… ▽ More The information transmission between nodes in a wireless sensor networks (WSNs) often causes packet loss due to denial-of-service (DoS) attack, energy limitations, and environmental factors, and the information that is successfully transmitted can also be contaminated by non-Gaussian noise. The presence of these two factors poses a challenge for distributed state estimation (DSE) over WSNs. In this paper, a generalized packet drop model is proposed to describe the packet loss phenomenon caused by DoS attacks and other factors. Moreover, a modified maximum correntropy Kalman filter is given, and it is extended to distributed form (DM-MCKF). In addition, a distributed modified maximum correntropy Kalman filter incorporating the generalized data packet drop (DM-MCKF-DPD) algorithm is provided to implement DSE with the presence of both non-Gaussian noise pollution and packet drop. A sufficient condition to ensure the convergence of the fixed-point iterative process of the DM-MCKF-DPD algorithm is presented and the computational complexity of the DM-MCKF-DPD algorithm is analyzed. Finally, the effectiveness and feasibility of the proposed algorithms are verified by simulations. △ Less

Submitted 6 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2307.00511 [pdf]

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, ** Zhang, Dan Hu, Danhong Wang, Hesheng Liu

Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.12898 [pdf]

Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots

Authors: Chao Shen, Wenkang Zhan, Kaiyao Xin, Manyang Li, Zhenyu Sun, Hui Cong, Chi Xu, Jian Tang, Zhaofeng Wu, Bo Xu, Zhongming Wei, Chunlai Xue, Chao Zhao, Zhanguo Wang

Abstract: Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for develo** various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam… ▽ More Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for develo** various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Here, we report a real-time feedback control method to realize the growth of QDs with arbitrary density, which is fully automated and intelligent. We developed a machine learning (ML) model named 3D ResNet 50 trained using reflection high-energy electron diffraction (RHEED) videos as input instead of static images and providing real-time feedback on surface morphologies for process control. As a result, we demonstrated that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5E10 cm-2 down to 3.8E8 cm-2 or up to 1.4E11 cm-2. Compared to traditional methods, our approach, with in situ tuning capabilities and excellent reliability, can dramatically expedite the material optimization process and improve the reproducibility of MBE, constituting significant progress for thin film growth techniques. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for optoelectronic and microelectronic industries. △ Less

Submitted 11 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 5 figures

arXiv:2306.10275 [pdf, other]

Multi-Scale Simulation of Complex Systems: A Perspective of Integrating Knowledge and Data

Authors: Huandong Wang, Huan Yan, Can Rong, Yuan Yuan, Fenyu Jiang, Zhenyu Han, Hongjie Sui, Depeng **, Yong Li

Abstract: Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will syste… ▽ More Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will systematically review the literature on multi-scale simulation of complex systems from the perspective of knowledge and data. Firstly, we will present background knowledge about simulating complex system simulation and the scales in complex systems. Then, we divide the main objectives of multi-scale modeling and simulation into five categories by considering scenarios with clear scale and scenarios with unclear scale, respectively. After summarizing the general methods for multi-scale simulation based on the clues of knowledge and data, we introduce the adopted methods to achieve different objectives. Finally, we introduce the applications of multi-scale simulation in typical matter systems and social systems. △ Less

Submitted 17 June, 2023; originally announced June 2023.

arXiv:2306.05581 [pdf, other]

Risk-aware Urban Air Mobility Network Design with Overflow Redundancy

Authors: Qinshuang Wei, Zhenyu Gao, John-Paul Clarke, Ufuk Topcu

Abstract: Urban air mobility (UAM), as envisioned by aviation professionals, will transport passengers and cargo at low altitudes within urban and suburban areas. To operate in urban environments, precise air traffic management, in particular the management of traffic overflows due to physical and operational disruptions will be critical to ensuring system safety and efficiency. To this end, we propose UAM… ▽ More Urban air mobility (UAM), as envisioned by aviation professionals, will transport passengers and cargo at low altitudes within urban and suburban areas. To operate in urban environments, precise air traffic management, in particular the management of traffic overflows due to physical and operational disruptions will be critical to ensuring system safety and efficiency. To this end, we propose UAM network design with reserve capacity, i.e., a design where alternative landing options and flight corridors are explicitly considered as a means of improving contingency management. Similar redundancy considerations are incorporated in the design of many critical infrastructures, yet remain unexploited in the air transportation literature. In our methodology, we first model how disruptions to a given UAM network might impact on the nominal traffic flow and how this flow might be re-accommodated on an extended network with reserve capacity. Then, through an optimization problem, we select the locations and capacities for the backup vertiports with the maximal expected throughput of the extended network over all possible disruption scenarios, while the throughput is the maximal amount of flights that the network can accommodate per unit of time. We show that we can obtain the solution for the corresponding bi-level and bi-linear optimization problem by solving a mixed-integer linear program. We demonstrate our methodology in the case study using networks from Milwaukee, Atlanta, and Dallas--Fort Worth metropolitan areas and show how the throughput and flexibility of the UAM networks with reserve capacity can outcompete those without. △ Less

Submitted 23 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 44 pages, 10 figures

arXiv:2305.16445 [pdf, other]

SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered Systems

Authors: Mahathir Monjur, Yubo Luo, Zhenyu Wang, Shahriar Nirjon

Abstract: A fundamental problem of every intermittently-powered sensing system is that signals acquired by these systems over a longer period in time are also intermittent. As a consequence, these systems fail to capture parts of a longer-duration event that spans over multiple charge-discharge cycles of the capacitor that stores the harvested energy. From an application's perspective, this is viewed as spo… ▽ More A fundamental problem of every intermittently-powered sensing system is that signals acquired by these systems over a longer period in time are also intermittent. As a consequence, these systems fail to capture parts of a longer-duration event that spans over multiple charge-discharge cycles of the capacitor that stores the harvested energy. From an application's perspective, this is viewed as sporadic bursts of missing values in the input data -- which may not be recoverable using statistical interpolation or imputation methods. In this paper, we study this problem in the light of an intermittent audio classification system and design an end-to-end system -- SoundSieve -- that is capable of accurately classifying audio events that span multiple on-off cycles of the intermittent system. SoundSieve employs an offline audio analyzer that learns to identify and predict important segments of an audio clip that must be sampled to ensure accurate classification of the audio. At runtime, SoundSieve employs a lightweight, energy- and content-aware audio sampler that decides when the system should wake up to capture the next chunk of audio; and a lightweight, intermittence-aware audio classifier that performs imputation and on-device inference. Through extensive evaluations using popular audio datasets as well as real systems, we demonstrate that SoundSieve yields 5%--30% more accurate inference results than the state-of-the-art. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: The 21st ACM International Conference on Mobile Systems, Applications, and Services (Mobisys 2023)

arXiv:2305.06806 [pdf, other]

HappyQuokka System for ICASSP 2023 Auditory EEG Challenge

Authors: Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang

Abstract: This report describes our submission to Task 2 of the Auditory EEG Decoding Challenge at ICASSP 2023 Signal Processing Grand Challenge (SPGC). Task 2 is a regression problem that focuses on reconstructing a speech envelope from an EEG signal. For the task, we propose a pre-layer normalized feed-forward transformer (FFT) architecture. For within-subjects generation, we additionally utilize an auxil… ▽ More This report describes our submission to Task 2 of the Auditory EEG Decoding Challenge at ICASSP 2023 Signal Processing Grand Challenge (SPGC). Task 2 is a regression problem that focuses on reconstructing a speech envelope from an EEG signal. For the task, we propose a pre-layer normalized feed-forward transformer (FFT) architecture. For within-subjects generation, we additionally utilize an auxiliary global conditioner which provides our model with additional information about seen individuals. Experimental results show that our proposed method outperforms the VLAAI baseline and all other submitted systems. Notably, it demonstrates significant improvements on the within-subjects task, likely thanks to our use of the auxiliary global conditioner. In terms of evaluation metrics set by the challenge, we obtain Pearson correlation values of 0.1895 0.0869 for the within-subjects generation test and 0.0976 0.0444 for the heldout-subjects test. We release the training code for our model online. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: First Place in Task 2 of Auditory EEG decoding Challenge, which is part of ICASSP Signal Processing Grand Challenge (SPGC) 2023

arXiv:2304.13471 [pdf, other]

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

Authors: Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Abstract: 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution.… ▽ More 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution. The first stage employs two branches: model A, which incorporates omnidirectional position-aware deformable blocks (OPDB) and Fourier upsampling, and model B, which adds a spatial frequency fusion module (SFF) to model A. Model A aims to enhance the feature extraction ability of 360° image positional information, while Model B further focuses on the high-frequency information of 360° images. The second stage performs same-resolution enhancement based on the structure of model A with a pixel unshuffle operation. In addition, we collected data from YouTube to improve the fitting ability of the transformer, and created pseudo low-resolution images using a degradation network. Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360° omnidirectional image super-resolution. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted to CVPRW 2023

arXiv:2304.04773 [pdf, other]

HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains

Authors: Huan**g Yue, Yubo Peng, Biting Yu, Xuanwu Yin, Zhenyu Zhou, **gyu Yang

Abstract: High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In… ▽ More High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper. △ Less

Submitted 12 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.00070 [pdf, ps, other]

HybridCVLNet: A Hybrid CSI Feedback System and its Domain Adaptation

Authors: Haozhen Li, Xinyu Gu, Boyuan Zhang, Dongliang Li, Zhenyu Liu, Lin Zhang

Abstract: Deep Learning (DL)-based channel state information (CSI) feedback is a promising technique for the transmitter to accurately acquire the CSI of massive multiple-input multiple-output (MIMO) systems. As critical concerns about DL-based physical layer applications, the intra-domain generalizability affected by dataset bias and inter-domain robustness in data drift remain challenging. Therefore, we b… ▽ More Deep Learning (DL)-based channel state information (CSI) feedback is a promising technique for the transmitter to accurately acquire the CSI of massive multiple-input multiple-output (MIMO) systems. As critical concerns about DL-based physical layer applications, the intra-domain generalizability affected by dataset bias and inter-domain robustness in data drift remain challenging. Therefore, we build on a Hybrid Complex-Valued Lightweight framework, namely the HybridCVLNet, capable of overcoming the dataset bias with regularized hybrid structure and codeword. Meanwhile, a corresponding transductive-based hybrid domain adaptation scheme is proposed to tackle the inter-domain data drift. The experiment verifies that HybridCVLNet achieves stable generalizability and performance gain over the state-of-the-art (SOTA) feedback schemes in an intra-domain heterogeneous dataset. In addition, its transductive-based hybrid domain adaptation scheme is more efficient and superior to the inductive-based transfer learning methods under two inter-domain online re-optimization settings. △ Less

Submitted 3 May, 2023; v1 submitted 30 March, 2023; originally announced April 2023.

arXiv:2303.04857 [pdf, other]

doi 10.1109/LRA.2024.3384908

Breaking Symmetries Leads to Diverse Quadrupedal Gaits

Authors: Jiayu Ding, Zhenyu Gan

Abstract: Symmetry manifests itself in legged locomotion in a variety of ways. No matter where a legged system begins to move periodically, the torso and limbs coordinate with each other's movements in a similar manner. Also, in many gaits observed in nature, the legs on both sides of the torso move in exactly the same way, sometimes they are just half a period out of phase. Furthermore, when some animals m… ▽ More Symmetry manifests itself in legged locomotion in a variety of ways. No matter where a legged system begins to move periodically, the torso and limbs coordinate with each other's movements in a similar manner. Also, in many gaits observed in nature, the legs on both sides of the torso move in exactly the same way, sometimes they are just half a period out of phase. Furthermore, when some animals move forward and backward, their movements are strikingly similar as if the time had been reversed. This work aims to generalize these phenomena and propose formal definitions of symmetries in legged locomotion using group theory terminology. Symmetries in some common quadrupedal gaits such as pronking, bounding, half-bounding, and gallo** have been discussed. Moreover, a spring-mass model has been used to demonstrate how breaking symmetries can alter gaits in a legged system. Studying the symmetries may provide insight into which gaits may be suitable for a particular robotic design, or may enable roboticists to design more agile and efficient robot controllers by using certain gaits. △ Less

Submitted 8 April, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Please refer to the published version to cite this paper

Journal ref: IEEE Robotics and Automation Letters, Institute of Electrical and Electronics Engineers (IEEE), 2024

arXiv:2302.09257 [pdf, other]

mmWave Coverage Extension Using Reconfigurable Intelligent Surfaces in Indoor Dense Spaces

Authors: Zhenyu Li, Ozan Alp Topal, Özlem Tuğfe Demir, Emil Björnson, Cicek Cavdar

Abstract: In this work, we consider the deployment of reconfigurable intelligent surfaces (RISs) to extend the coverage of a millimeter-wave (mmWave) network in indoor dense spaces. We first integrate RIS into ray-tracing simulations to realistically capture the propagation characteristics, then formulate a non-convex optimization problem that minimizes the number of RISs under rate constraints. We propose… ▽ More In this work, we consider the deployment of reconfigurable intelligent surfaces (RISs) to extend the coverage of a millimeter-wave (mmWave) network in indoor dense spaces. We first integrate RIS into ray-tracing simulations to realistically capture the propagation characteristics, then formulate a non-convex optimization problem that minimizes the number of RISs under rate constraints. We propose a feasible point pursuit and successive convex approximation-based algorithm, which solves the problem by jointly selecting the RIS locations, optimizing the RIS phase-shifts, and allocating time resources to user equipments (UEs). The numerical results demonstrate substantial coverage extension by using at least four RISs, and a data rate of 130 Mbit/s is guaranteed for UEs in the considered area of an airplane cabin. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: 6 pages 8 figures. Accepted to be presented in IEEE ICC 2023

arXiv:2302.06980 [pdf, other]

doi 10.1109/TCCN.2023.3324634

Deep Learning-Based Modeling of 5G Core Control Plane for 5G Network Digital Twin

Authors: Zhenyu Tao, Yongliang Guo, Guanghui He, Yongming Huang, Xiaohu You

Abstract: Digital twin serves as a crucial facilitator in the advancement and implementation of emerging technologies within 5G and beyond networks. However, the intricate structure and diverse functionalities of the existing 5G core network, especially the control plane, present challenges in constructing core network digital twins. In this paper, we propose two novel data-driven architectures for modeling… ▽ More Digital twin serves as a crucial facilitator in the advancement and implementation of emerging technologies within 5G and beyond networks. However, the intricate structure and diverse functionalities of the existing 5G core network, especially the control plane, present challenges in constructing core network digital twins. In this paper, we propose two novel data-driven architectures for modeling the 5G control plane and implement corresponding deep learning models, namely 5GC-Seq2Seq and 5GC-former, based on the Vanilla Seq2Seq model and Transformer decoder respectively. We also present a solution enabling the interconversion of signaling messages and length-limited vectors to construct a dataset. The experiments are based on 5G core network signaling messages collected by the Spirent C50 network tester, encompassing various procedures such as registration, handover, and PDU sessions. The results show that 5GC-Seq2Seq achieves a 99.997\% F1-score (a metric measuring the accuracy of positive samples) in single UE scenarios with a simple structure, but exhibits significantly reduced performance in handling concurrency. In contrast, 5GC-former surpasses 99.999\% F1-score while maintaining robust performance under concurrent UE scenarios by constructing a more complex and highly parallel model. These findings validate that our method accurately replicates the principal functionalities of the 5G core network control plane. △ Less

Submitted 18 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Journal ref: IEEE Transactions on Cognitive Communications and Networking

arXiv:2301.05351 [pdf, other]

Data-driven Moving Horizon Estimation for Angular Velocity of Space Noncooperative Target in Eddy Current De-tumbling Mission

Authors: Xiyao Liu, Haitao Chang, Zhenyu Lu, Panfeng Huang

Abstract: Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this… ▽ More Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this method, model-free state estimation of the angular velocity can be achieved using only one historical trajectory data that satisfies the rank condition. With local linear approximation, the Willems fundamental lemma is extended to nonlinear autonomous systems, and the rank condition for the historical trajectory data is deduced. Then, a data-driven moving horizon estimation algorithm based on the M step Lyapunov function is designed, and the time-discount robust stability of the algorithm is given. In order to illustrate the effectiveness of the proposed algorithm, experiments and simulations are performed to estimate the angular velocity in eddy current de-tumbling with only de-tumbling torque measurement. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2301.04311 [pdf, other]

Active-IRS-Aided Wireless Communication: Fundamentals, Designs and Open Issues

Authors: Zhenyu Kang, Changsheng You, Rui Zhang

Abstract: Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enabl… ▽ More Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enables simultaneous signal reflection and amplification, thus significantly extending the signal coverage of passive IRS. We first present the fundamentals of active IRS, including its hardware architecture, signal and channel models, as well as practical constraints, in comparison with those of passive IRS. Then, we discuss new considerations and open issues in designing active-IRS-aided wireless communications, such as the reflection optimization, channel estimation, and deployment for active IRS, as well as its integrated design with passive IRS. Finally, numerical results are provided to show the potential performance gains of active IRS as compared to passive IRS and traditional active relay. △ Less

Submitted 25 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2212.05360 [pdf, other]

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

Authors: Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

Abstract: We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation. We use the GWA dataset that con… ▽ More We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation. We use the GWA dataset that consists of synthetic RIRs generated in a hybrid fashion: an accurate wave-based solver is used to simulate the lower frequencies and geometric ray tracing methods simulate the higher frequencies. We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods on four real-world RIR datasets. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: Submitted to ICASSP 2023

arXiv:2211.15995 [pdf]

Shadow-Oriented Tracking Method for Multi-Target Tracking in Video-SAR

Authors: Xiaochuan Ni, Xiaoling Zhang, Xu Zhan, Zhenyu Yang, Jun Shi, Shunjun Wei, Tianjiao Zeng

Abstract: This work focuses on multi-target tracking in Video synthetic aperture radar. Specifically, we refer to tracking based on targets' shadows. Current methods have limited accuracy as they fail to consider shadows' characteristics and surroundings fully. Shades are low-scattering and varied, resulting in missed tracking. Surroundings can cause interferences, resulting in false tracking. To solve thes… ▽ More This work focuses on multi-target tracking in Video synthetic aperture radar. Specifically, we refer to tracking based on targets' shadows. Current methods have limited accuracy as they fail to consider shadows' characteristics and surroundings fully. Shades are low-scattering and varied, resulting in missed tracking. Surroundings can cause interferences, resulting in false tracking. To solve these, we propose a shadow-oriented multi-target tracking method (SOTrack). To avoid false tracking, a pre-processing module is proposed to enhance shadows from surroundings, thus reducing their interferences. To avoid missed tracking, a detection method based on deep learning is designed to thoroughly learn shadows' features, thus increasing the accurate estimation. And further, a recall module is designed to recall missed shadows. We conduct experiments on measured data. Results demonstrate that, compared with other methods, SOTrack achieves much higher performance in tracking accuracy-18.4%. And ablation study confirms the effectiveness of the proposed modules. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.14785 [pdf, ps, other]

Deep Learning for Efficient CSI Feedback in Massive MIMO: Adapting to New Environments and Small Datasets

Authors: Zhenyu Liu, Li Wang, Lianming Xu, Zhi Ding

Abstract: Deep learning (DL)-based channel state information (CSI) feedback has shown promising potential to improve spectrum efficiency in massive MIMO systems. However, practical DL approaches require a sizeable CSI dataset for each scenario, and require large storage or updating bandwidth for multiple learned models. To overcome this costly barrier, we develop a solution for efficient training and deploy… ▽ More Deep learning (DL)-based channel state information (CSI) feedback has shown promising potential to improve spectrum efficiency in massive MIMO systems. However, practical DL approaches require a sizeable CSI dataset for each scenario, and require large storage or updating bandwidth for multiple learned models. To overcome this costly barrier, we develop a solution for efficient training and deployment enhancement of DL-based CSI feedback by exploiting a lightweight translation model to cope with new CSI environments and by proposing novel dataset augmentation based on domain knowledge. Specifically, we first develop a deep unfolding CSI feedback network, SPTM2-ISTANet+, which employs spherical normalization to address the challenge of path loss variation. We also introduce an integration of a trainable measurement matrix and residual CSI recovery blocks within SPTM2-ISTANet+ to improve efficiency and accuracy. Using SPTM2-ISTANet+ as the anchor feedback model, we propose an efficient scenario-adaptive CSI feedback architecture. This new CSI-TransNet exploits a plug-in module for CSI translation consisting of a sparsity aligning function and lightweight DL module to reuse pretrained models in unseen environments. To work with small datasets, we propose a lightweight and general augmentation strategy based on domain knowledge. Test results demonstrate the efficacy and efficiency of the proposed solution for accurate CSI feedback given limited measurements for unseen CSI environments. △ Less

Submitted 6 November, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: 13 pages, 11 figures, 6 tables

arXiv:2211.12671 [pdf, other]

3-D Positioning and Resource Allocation for Multi-UAV Base Stations Under Blockage-Aware Channel Model

Authors: Pengfei Yi, Lipeng Zhu, Zhenyu Xiao, Rui Zhang, Zhu Han, Xiang-Gen Xia

Abstract: In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of… ▽ More In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of the channels at different locations. Based on the proposed channel model, we formulate the joint optimization problem of UAV three-dimensional (3-D) positioning and resource allocation, by power allocation, user association, and subcarrier allocation, to maximize the minimum achievable rate among users. To solve this non-convex combinatorial programming problem, we introduce a penalty term to relax it and develop a suboptimal solution via a penalty-based double-loop iterative optimization framework. The inner loop solves the penalized problem by employing the block successive convex approximation (BSCA) technique, where the UAV positioning and resource allocation are alternately optimized in each iteration. The outer loop aims to obtain proper penalty multipliers to ensure the solution of the penalized problem converges to that of the original problem. Simulation results demonstrate the superiority of the proposed algorithm over other benchmark schemes in terms of the minimum achievable rate. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.09913 [pdf, other]

doi 10.1109/TASLP.2021.3130975

Multi-source Domain Adaptation for Text-independent Forensic Speaker Recognition

Authors: Zhenyu Wang, John H. L. Hansen

Abstract: Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain adaptation, which neglects a more practical scenario where training data are collected from multiple acoustic domains needed in forensic scenarios. Au… ▽ More Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain adaptation, which neglects a more practical scenario where training data are collected from multiple acoustic domains needed in forensic scenarios. Audio analysis for forensic speaker recognition offers unique challenges in model training with multi-domain training data due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and performance loss. Fine-tuning is a commonly-used method for adaptation in order to retrain the model with weights initialized from a well-trained model. Alternatively, in this study, three novel adaptation methods based on domain adversarial training, discrepancy minimization, and moment-matching approaches are proposed to further promote adaptation performance across multiple acoustic domains. A comprehensive set of experiments are conducted to demonstrate that: 1) diverse acoustic environments do impact speaker recognition performance, which could advance research in audio forensics, 2) domain adversarial training learns the discriminative features which are also invariant to shifts between domains, 3) discrepancy-minimizing adaptation achieves effective performance simultaneously across multiple acoustic domains, and 4) moment-matching adaptation along with dynamic distribution alignment also significantly promotes speaker recognition performance on each domain, especially for the LENA-field domain with noise compared to all other systems. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

arXiv:2211.09898 [pdf, other]

doi 10.21437/Interspeech.2022-904

Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning

Authors: Zhenyu Wang, John H. L. Hansen

Abstract: Automatic speaker verification systems are vulnerable to a variety of access threats, prompting research into the formulation of effective spoofing detection systems to act as a gate to filter out such spoofing attacks. This study introduces a simple attention module to infer 3-dim attention weights for the feature map in a convolutional layer, which then optimizes an energy function to determine… ▽ More Automatic speaker verification systems are vulnerable to a variety of access threats, prompting research into the formulation of effective spoofing detection systems to act as a gate to filter out such spoofing attacks. This study introduces a simple attention module to infer 3-dim attention weights for the feature map in a convolutional layer, which then optimizes an energy function to determine each neuron's importance. With the advancement of both voice conversion and speech synthesis technologies, unseen spoofing attacks are constantly emerging to limit spoofing detection system performance. Here, we propose a joint optimization approach based on the weighted additive angular margin loss for binary classification, with a meta-learning training framework to develop an efficient system that is robust to a wide range of spoofing attacks for model generalization enhancement. As a result, when compared to current state-of-the-art systems, our proposed approach delivers a competitive result with a pooled EER of 0.99% and min t-DCF of 0.0289. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Interspeech 2022

arXiv:2211.04470 [pdf, other]

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

arXiv:2210.08493 [pdf, other]

Indoor Smartphone SLAM with Learned Echoic Location Features

Authors: Wenjie Luo, Qun Song, Zhenyu Yan, Rui Tan, Guosheng Lin

Abstract: Indoor self-localization is a highly demanded system function for smartphones. The current solutions based on inertial, radio frequency, and geomagnetic sensing may have degraded performance when their limiting factors take effect. In this paper, we present a new indoor simultaneous localization and map** (SLAM) system that utilizes the smartphone's built-in audio hardware and inertial measureme… ▽ More Indoor self-localization is a highly demanded system function for smartphones. The current solutions based on inertial, radio frequency, and geomagnetic sensing may have degraded performance when their limiting factors take effect. In this paper, we present a new indoor simultaneous localization and map** (SLAM) system that utilizes the smartphone's built-in audio hardware and inertial measurement unit (IMU). Our system uses a smartphone's loudspeaker to emit near-inaudible chirps and then the microphone to record the acoustic echoes from the indoor environment. Our profiling measurements show that the echoes carry location information with sub-meter granularity. To enable SLAM, we apply contrastive learning to construct an echoic location feature (ELF) extractor, such that the loop closures on the smartphone's trajectory can be accurately detected from the associated ELF trace. The detection results effectively regulate the IMU-based trajectory reconstruction. Extensive experiments show that our ELF-based SLAM achieves median localization errors of $0.1\,\text{m}$, $0.53\,\text{m}$, and $0.4\,\text{m}$ on the reconstructed trajectories in a living room, an office, and a shop** mall, and outperforms the Wi-Fi and geomagnetic SLAM systems. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2210.06512 [pdf]

Quantifying U-Net Uncertainty in Multi-Parametric MRI-based Glioma Segmentation by Spherical Image Projection

Authors: Zhenyu Yang, Kyle Lafata, Eugene Vaios, Zongsheng Hu, Trey Mullikin, Fang-Fang Yin, Chunhao Wang

Abstract: The projection of planar MRI data onto a spherical surface is equivalent to a nonlinear image transformation that retains global anatomical information. By incorporating this image transformation process in our proposed spherical projection-based U-Net (SPU-Net) segmentation model design, multiple independent segmentation predictions can be obtained from a single MRI. The final segmentation is the… ▽ More The projection of planar MRI data onto a spherical surface is equivalent to a nonlinear image transformation that retains global anatomical information. By incorporating this image transformation process in our proposed spherical projection-based U-Net (SPU-Net) segmentation model design, multiple independent segmentation predictions can be obtained from a single MRI. The final segmentation is the average of all available results, and the variation can be visualized as a pixel-wise uncertainty map. An uncertainty score was introduced to evaluate and compare the performance of uncertainty measurements. The proposed SPU-Net model was implemented on the basis of 369 glioma patients with MP-MRI scans (T1, T1-Ce, T2, and FLAIR). Three SPU-Net models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT), respectively. The SPU-Net model was compared with (1) the classic U-Net model with test-time augmentation (TTA) and (2) linear scaling-based U-Net (LSU-Net) segmentation models in terms of both segmentation accuracy (Dice coefficient, sensitivity, specificity, and accuracy) and segmentation uncertainty (uncertainty map and uncertainty score). The developed SPU-Net model successfully achieved low uncertainty for correct segmentation predictions (e.g., tumor interior or healthy tissue interior) and high uncertainty for incorrect results (e.g., tumor boundaries). This model could allow the identification of missed tumor targets or segmentation errors in U-Net. Quantitatively, the SPU-Net model achieved the highest uncertainty scores for three segmentation targets (ET/TC/WT): 0.826/0.848/0.936, compared to 0.784/0.643/0.872 using the U-Net with TTA and 0.743/0.702/0.876 with the LSU-Net (scaling factor = 2). The SPU-Net also achieved statistically significantly higher Dice coefficients, underscoring the improved segmentation accuracy. △ Less

Submitted 12 August, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 31 pages, 9 figures, 1 table

Showing 1–50 of 146 results for author: Zhenyu