-
A Note On the Clark Conjecture On Time-Warped Bandlimited Signals
Authors:
Xiang-Gen Xia
Abstract:
In this note, a result of a previous paper on the Clark conjecture on time-warped bandlimited signals is extended to a more general class of the time war** functions, which includes most of the common functions in practice.
In this note, a result of a previous paper on the Clark conjecture on time-warped bandlimited signals is extended to a more general class of the time war** functions, which includes most of the common functions in practice.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
An I2I Inpainting Approach for Efficient Channel Knowledge Map Construction
Authors:
Zhenzhou **,
Li You,
Jue Wang,
Xiang-Gen Xia,
Xiqi Gao
Abstract:
Channel knowledge map (CKM) has received widespread attention as an emerging enabling technology for environment-aware wireless communications. It involves the construction of databases containing location-specific channel knowledge, which are then leveraged to facilitate channel state information (CSI) acquisition and transceiver design. In this context, a fundamental challenge lies in efficientl…
▽ More
Channel knowledge map (CKM) has received widespread attention as an emerging enabling technology for environment-aware wireless communications. It involves the construction of databases containing location-specific channel knowledge, which are then leveraged to facilitate channel state information (CSI) acquisition and transceiver design. In this context, a fundamental challenge lies in efficiently constructing the CKM based on a given wireless propagation environment. Most existing methods are based on stochastic modeling and sequence prediction, which do not fully exploit the inherent physical characteristics of the propagation environment, resulting in low accuracy and high computational complexity. To address these limitations, we propose a Laplacian pyramid (LP)-based CKM construction scheme to predict the channel knowledge at arbitrary locations in a targeted area. Specifically, we first view the channel knowledge as a 2-D image and transform the CKM construction problem into an image-to-image (I2I) inpainting task, which predicts the channel knowledge at a specific location by recovering the corresponding pixel value in the image matrix. Then, inspired by the reversible and closed-form structure of the LP, we show its natural suitability for our task in designing a fast I2I map** network. For different frequency components of LP decomposition, we design tailored networks accordingly. Besides, to encode the global structural information of the propagation environment, we introduce self-attention and cross-covariance attention mechanisms in different layers, respectively. Finally, experimental results show that the proposed scheme outperforms the benchmark, achieving higher reconstruction accuracy while with lower computational complexity. Moreover, the proposed approach has a strong generalization ability and can be implemented in different wireless communication scenarios.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
Authors:
Mingshuai Liu,
Zhuangqi Chen,
Xiaopeng Yan,
Yuanjun Lv,
Xianjun Xia,
Chuanzeng Huang,
Yijian Xiao,
Lei Xie
Abstract:
In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor…
▽ More
In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's performance. To mitigate these problems, we extend RaD-Net to its upgraded version, RaD-Net 2. Specifically, a causality-based knowledge distillation is introduced in the first stage to use future information in a causal way. We use the non-causal repairing network as the teacher to improve the performance of the causal repairing network. In addition, in the second stage, complex axial self-attention is applied in the denoising network's complex feature encoder/decoder. Experimental results on the ICASSP 2024 SSI Challenge blind test set show that RaD-Net 2 brings 0.10 OVRL DNSMOS improvement compared to RaD-Net.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
Authors:
Zihan Zhang,
Xianjun Xia,
Chuanzeng Huang,
Yijian Xiao,
Lei Xie
Abstract:
Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce comput…
▽ More
Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further. Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student. Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal. With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
A Simple Channel Independent Beamforming Scheme With Parallel Uniform Circular Array
Authors:
Haiyue **g,
Wenchi Cheng,
Xiang-Gen Xia
Abstract:
In this letter, we consider a uniform circular array (UCA)-based line-of-sight multiple-input-multiple-output system, where the transmit and receive UCAs are parallel but non-coaxial with each other. We propose a simple channel-independent beamforming scheme with fast symbol-wise maximum likelihood detection.
In this letter, we consider a uniform circular array (UCA)-based line-of-sight multiple-input-multiple-output system, where the transmit and receive UCAs are parallel but non-coaxial with each other. We propose a simple channel-independent beamforming scheme with fast symbol-wise maximum likelihood detection.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Asynchronous MIMO-OFDM Massive Unsourced Random Access with Codeword Collisions
Authors:
Tianya Li,
Yongpeng Wu,
Junyuan Gao,
Wenjun Zhang,
Xiang-Gen Xia,
Derrick Wing Kwan Ng,
Chengshan Xiao
Abstract:
This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts,…
▽ More
This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts, with the former being tree-coded and the latter LDPC-coded. By leveraging the dual sparsity of the equivalent channel across both codeword and delay domains (CD and DD), we develop a message passing-based sparse Bayesian learning algorithm, combined with belief propagation and mean field, to iteratively estimate DD channel responses, TO, and delay profiles. Furthermore, we establish a novel graph-based algorithm to iteratively separate the superimposed channels and compensate for the phase rotations. Additionally, the proposed algorithm is applied to the flat fading scenario to estimate both TO and CFO, where the channel and offset estimation is enhanced by leveraging the geometric characteristics of the signal constellation. Simulations reveal that the proposed algorithm achieves superior performance and substantial complexity reduction in both channel and offset estimation compared to the codebook enlarging-based counterparts, and enhanced data recovery performances compared to state-of-the-art URA schemes.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Weighted Sum-Rate Maximization for Movable Antenna-Enhanced Wireless Networks
Authors:
Biqian Feng,
Yongpeng Wu,
Xiang-Gen Xia,
Chengshan Xiao
Abstract:
This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Mo…
▽ More
This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Moreover, we propose a planar movement mode, which constrains each MA to a specified area, we obtain a low-complexity closed-form solution. Numerical results demonstrate that the MA-enhanced system outperforms the conventional system. Besides, the computation time for the planar movement mode is reduced by approximately 30\% at a little performance expense.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization
Authors:
Rui Sun,
Li You,
An-An Lu,
Chen Sun,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov…
▽ More
In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By proving that the precoder set satisfying the per-BS power constraints forms a Riemannian submanifold of a linear product manifold, we transform the constrained precoder design problem in Euclidean space to an unconstrained one on the Riemannian submanifold. Riemannian ingredients, including orthogonal projection, Riemannian gradient, retraction and vector transport, of the problem on the Riemannian submanifold are further derived, with which the Riemannian conjugate gradient (RCG) design method is proposed for solving the unconstrained problem. The proposed method avoids the inverses of large dimensional matrices, which is beneficial in practice. The complexity analyses show the high computational efficiency of RCG precoder design. Simulation results demonstrate the numerical superiority of the proposed precoder design and the high efficiency of the UCN mMIMO system.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach
Authors:
Keke Huang,
Wencai Cao,
Hoang Ta,
Xiaokui Xiao,
Pietro Liò
Abstract:
Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization.
In…
▽ More
Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization.
In this paper, we first unify polynomial graph filters, as well as the optimal filters of identical degrees into the Krylov subspace of the same order, thus providing equivalent expressive power theoretically. Next, we investigate the asymptotic convergence property of polynomials from the unified Krylov subspace perspective, revealing their limited adaptability in graphs with varying heterophily degrees. Inspired by those facts, we design a novel adaptive Krylov subspace approach to optimize polynomial bases with provable controllability over the graph spectrum so as to adapt various heterophily graphs. Subsequently, we propose AdaptKry, an optimized polynomial graph filter utilizing bases from the adaptive Krylov subspaces. Meanwhile, in light of the diverse spectral properties of complex graphs, we extend AdaptKry by leveraging multiple adaptive Krylov bases without incurring extra training costs. As a consequence, extended AdaptKry is able to capture the intricate characteristics of graphs and provide insights into their inherent complexity. We conduct extensive experiments across a series of real-world datasets. The experimental results demonstrate the superior filtering capability of AdaptKry, as well as the optimized efficacy of the adaptive Krylov basis.
△ Less
Submitted 20 May, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Authors:
Lin** Xu,
Jiawei Jiang,
Dejun Zhang,
Xianjun Xia,
Li Chen,
Yijian Xiao,
Piao Ding,
Shenyi Song,
Sixing Yin,
Ferdous Sohel
Abstract:
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave…
▽ More
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Predicting Mitral Valve mTEER Surgery Outcomes Using Machine Learning and Deep Learning Techniques
Authors:
Tejas Vyas,
Mohsena Chowdhury,
Xiaojiao Xiao,
Mathias Claeys,
Géraldine Ong,
Guanghui Wang
Abstract:
Mitral Transcatheter Edge-to-Edge Repair (mTEER) is a medical procedure utilized for the treatment of mitral valve disorders. However, predicting the outcome of the procedure poses a significant challenge. This paper makes the first attempt to harness classical machine learning (ML) and deep learning (DL) techniques for predicting mitral valve mTEER surgery outcomes. To achieve this, we compiled a…
▽ More
Mitral Transcatheter Edge-to-Edge Repair (mTEER) is a medical procedure utilized for the treatment of mitral valve disorders. However, predicting the outcome of the procedure poses a significant challenge. This paper makes the first attempt to harness classical machine learning (ML) and deep learning (DL) techniques for predicting mitral valve mTEER surgery outcomes. To achieve this, we compiled a dataset from 467 patients, encompassing labeled echocardiogram videos and patient reports containing Transesophageal Echocardiography (TEE) measurements detailing Mitral Valve Repair (MVR) treatment outcomes. Leveraging this dataset, we conducted a benchmark evaluation of six ML algorithms and two DL models. The results underscore the potential of ML and DL in predicting mTEER surgery outcomes, providing insight for future investigation and advancements in this domain.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Authors:
Alon Vinnikov,
Amir Ivry,
Aviv Hurvitz,
Igor Abramovski,
Sharon Koubi,
Ilya Gurvich,
Shai Pe`er,
Xiong Xiao,
Benjamin Martinez Elizalde,
Naoyuki Kanda,
Xiaofei Wang,
Shalev Shaer,
Stav Yagev,
Yossi Asher,
Sunit Sivasankaran,
Yifan Gong,
Min Tang,
Huaming Wang,
Eyal Krupka
Abstract:
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First…
▽ More
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
Authors:
Mingshuai Liu,
Zhuangqi Chen,
Xiaopeng Yan,
Yuanjun Lv,
Xianjun Xia,
Chuanzeng Huang,
Yijian Xiao,
Lei Xie
Abstract:
This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training…
▽ More
This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training stage. Finally, we use a three-step training strategy to optimize our model. We submit two models with different sets of parameters to meet the RTF requirement of the two tracks. According to the official results, the proposed systems rank 2nd in track 1 and 3rd in track 2.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators
Authors:
Zihan Zhang,
Jiayao Sun,
Xianjun Xia,
Chuanzeng Huang,
Yijian Xiao,
Lei Xie
Abstract:
Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems. To deal with the problem, we propose a band-split packet loss concealment network (BS-PLCNet). Specifically, we split the full-band signal into wide-band (0-8kHz) and high-band (8-24kHz). The wide-band signals are processed by a gated convolutional recurrent network (GCRN), while the high-band counterpart…
▽ More
Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems. To deal with the problem, we propose a band-split packet loss concealment network (BS-PLCNet). Specifically, we split the full-band signal into wide-band (0-8kHz) and high-band (8-24kHz). The wide-band signals are processed by a gated convolutional recurrent network (GCRN), while the high-band counterpart is processed by a simple GRU network. To ensure high speech quality and automatic speech recognition (ASR) compatibility, multi-task learning (MTL) framework including fundamental frequency (f0) prediction, linguistic awareness, and multi-discriminators are used. The proposed approach tied for 1st place in the ICASSP 2024 PLC Challenge.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training
Authors:
Yequan Zhao,
Xian Xiao,
Xinling Yu,
Ziyue Liu,
Zhixiong Chen,
Geza Kurczveil,
Raymond G. Beausoleil,
Zheng Zhang
Abstract:
Solving partial differential equations (PDEs) numerically often requires huge computing time, energy cost, and hardware resources in practical applications. This has limited their applications in many scenarios (e.g., autonomous systems, supersonic flows) that have a limited energy budget and require near real-time response. Leveraging optical computing, this paper develops an on-chip training fra…
▽ More
Solving partial differential equations (PDEs) numerically often requires huge computing time, energy cost, and hardware resources in practical applications. This has limited their applications in many scenarios (e.g., autonomous systems, supersonic flows) that have a limited energy budget and require near real-time response. Leveraging optical computing, this paper develops an on-chip training framework for physics-informed neural networks (PINNs), aiming to solve high-dimensional PDEs with fJ/MAC photonic power consumption and ultra-low latency. Despite the ultra-high speed of optical neural networks, training a PINN on an optical chip is hard due to (1) the large size of photonic devices, and (2) the lack of scalable optical memory devices to store the intermediate results of back-propagation (BP). To enable realistic optical PINN training, this paper presents a scalable method to avoid the BP process. We also employ a tensor-compressed approach to improve the convergence and scalability of our optical PINN training. This training framework is designed with tensorized optical neural networks (TONN) for scalable inference acceleration and MZI phase-domain tuning for \textit{in-situ} optimization. Our simulation results of a 20-dim HJB PDE show that our photonic accelerator can reduce the number of MZIs by a factor of $1.17\times 10^3$, with only $1.36$ J and $1.15$ s to solve this equation. This is the first real-size optical PINN training framework that can be applied to solve high-dimensional PDEs.
△ Less
Submitted 4 January, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks
Authors:
Anzhe Cheng,
Zhenkun Wang,
Chenzhong Yin,
Mingxi Cheng,
Heng **,
Xiongye Xiao,
Shahin Nazarian,
Paul Bogdan
Abstract:
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchron…
▽ More
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Channel Estimation for Movable Antenna Communication Systems: A Framework Based on Compressed Sensing
Authors:
Zhenyu Xiao,
Songqi Cao,
Lipeng Zhu,
Yanming Liu,
Xiang-Gen Xia,
Rui Zhang
Abstract:
Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose…
▽ More
Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose a general channel estimation framework for MA systems by exploiting the multi-path field response channel structure. Specifically, the angles of departure (AoDs), angles of arrival (AoAs), and complex coefficients of the multi-path components (MPCs) are jointly estimated by employing the compressed sensing method, based on multiple channel measurements at designated positions of the Tx-MA and Rx-MA. Under this framework, the Tx-MA and Rx-MA measurement positions fundamentally determine the measurement matrix for compressed sensing, of which the mutual coherence is analyzed from the perspective of Fourier transform. Moreover, two criteria for MA measurement positions are provided to guarantee the successful recovery of MPCs. Then, we propose several MA measurement position setups and compare their performance. Finally, comprehensive simulation results show that the proposed framework is able to estimate the complete CSI between the Tx and Rx regions with a high accuracy.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser
Authors:
Peng Chen,
Xiaobao Wei,
Ming Lu,
Yitong Zhu,
Naiming Yao,
Xingyu Xiao,
Hui Chen
Abstract:
Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation ge…
▽ More
Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic map** from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released.
△ Less
Submitted 2 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Robust Multidimentional Chinese Remainder Theorem for Integer Vector Reconstruction
Authors:
Li Xiao,
Haiye Huo,
Xiang-Gen Xia
Abstract:
The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the mod…
▽ More
The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the moduli are pairwise commutative and coprime. The strict constraint on the moduli limits the usefulness of the robust MD-CRT in practice. In this paper, we investigate the robust MD-CRT for a general set of moduli. We first introduce a necessary and sufficient condition on the difference between paired remainder errors, followed by a simple sufficient condition on the remainder error bound, for the robust MD-CRT for general moduli, where the conditions are associated with (the minimum distances of) these lattices generated by gcld's of paired moduli, and a closed-form reconstruction algorithm is presented. We then generalize the above results of the robust MD-CRT from integer vectors/matrices to real ones. Finally, we validate the robust MD-CRT for general moduli by employing numerical simulations, and apply it to MD sinusoidal frequency estimation based on multiple sub-Nyquist samplers.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems
Authors:
Xinyu Xiao,
Zhennan Zhou,
Bin Dong,
Dingjiong Ma,
Li Zhou,
Jie Sun
Abstract:
Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model…
▽ More
Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model. However, these models are often limited to a specific symbol rate and channel number, necessitating retraining for different settings, their performance declines significantly under high-speed and high-power conditions. We introduce Meta-DSP, a novel data-driven nonlinear compensation model based on meta-learning that processes multi-modal data across diverse transmission rates, power levels, and channel numbers. This not only enhances signal quality but also substantially reduces the complexity of the nonlinear processing algorithm. Our model delivers a 0.7 dB increase in the Q-factor over Electronic Dispersion Compensation (EDC), and compared to DBP, it curtails computational complexity by a factor of ten while retaining comparable performance. From the perspective of the entire signal processing system, the core idea of Meta-DSP can be employed in any segment of the overall communication system to enhance the model's scalability and generalization performance. Our research substantiates Meta-DSP's proficiency in addressing the critical parameters defining optical communication networks.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Delay Doppler Transform
Authors:
Xiang-Gen Xia
Abstract:
This letter is to introduce delay Doppler transform (DDT) for a time domain signal. It is motivated by the recent studies in wireless communications over delay Doppler channels that have both time and Doppler spreads, such as, satellite communication channels. We present some simple properties of DDT as well. The DDT study may provide insights of delay Doppler channels.
This letter is to introduce delay Doppler transform (DDT) for a time domain signal. It is motivated by the recent studies in wireless communications over delay Doppler channels that have both time and Doppler spreads, such as, satellite communication channels. We present some simple properties of DDT as well. The DDT study may provide insights of delay Doppler channels.
△ Less
Submitted 3 December, 2023; v1 submitted 9 November, 2023;
originally announced November 2023.
-
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors:
Jianing Qiu,
Jian Wu,
Hao Wei,
Peilun Shi,
Minqing Zhang,
Yunyun Sun,
Lin Li,
Hanruo Liu,
Hongyi Liu,
Simeng Hou,
Yuyang Zhao,
Xuehui Shi,
Junfang Xian,
Xiaoxia Qu,
Sirui Zhu,
Lijie Pan,
Xiaoniao Chen,
Xiaojia Zhang,
Shuai Jiang,
Kebing Wang,
Chenlong Yang,
Mingqiang Chen,
Sujie Fan,
Jianhua Hu,
Aiguo Lv
, et al. (17 additional authors not shown)
Abstract:
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi…
▽ More
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation
Authors:
Zihan Zhang,
Jiayao Sun,
Xianjun Xia,
Ziqian Wang,
Xiaopeng Yan,
Yijian Xiao,
Lei Xie
Abstract:
Deep learning based techniques have been popularly adopted in acoustic echo cancellation (AEC). Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC). Meanwhile, task-decoupling strategies are widely adopted in speech enhancement. To further explore the task-decoupling approach, we propos…
▽ More
Deep learning based techniques have been popularly adopted in acoustic echo cancellation (AEC). Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC). Meanwhile, task-decoupling strategies are widely adopted in speech enhancement. To further explore the task-decoupling approach, we propose to use a two-stage task-decoupling post-filter (TDPF) in PAEC. Furthermore, a multi-scale local-global speaker representation is applied to improve speaker extraction in PAEC. Experimental results indicate that the task-decoupling model can yield better performance than a single joint network. The optimal approach is to decouple the echo cancellation from noise and interference speech suppression. Based on the task-decoupling sequence, optimal training strategies for the two-stage model are explored afterwards.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Disturbance Rejection Control for Autonomous Trolley Collection Robots with Prescribed Performance
Authors:
Rui-Dong Xi,
Liang Lu,
Xue Zhang,
Xiao Xiao,
Bingyi Xia,
Jiankun Wang,
Max Q. -H. Meng
Abstract:
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped di…
▽ More
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped disturbances. On this basis, a robust controller with prescribed performance is proposed using a backstep** technique, which improves the transient performance and guarantees fast convergence. Simulation outcomes have been provided to illustrate the effectiveness of the proposed control scheme.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Authors:
Dongmei Wang,
Xiong Xiao,
Naoyuki Kanda,
Midia Yousefi,
Takuya Yoshioka,
Jian Wu
Abstract:
Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from errors in speaker profiles, as those profiles are typically obtained by running a traditional clustering-based diarization method over the input signal. T…
▽ More
Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from errors in speaker profiles, as those profiles are typically obtained by running a traditional clustering-based diarization method over the input signal. This paper proposes an extension to TS-VAD, called Profile-Error-Tolerant TS-VAD (PET-TSVAD), which is robust to such speaker profile errors. This is achieved by employing transformer-based TS-VAD that can handle a variable number of speakers and further introducing a set of additional pseudo-speaker profiles to handle speakers undetected during the first pass diarization. During training, we use speaker profiles estimated by multiple different clustering algorithms to reduce the mismatch between the training and testing conditions regarding speaker profiles. Experimental results show that PET-TSVAD consistently outperforms the existing TS-VAD method on both the VoxConverse and DIHARD-I datasets.
△ Less
Submitted 3 April, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Authors:
Yi Meng,
Xiang Li,
Zhiyong Wu,
Tingtian Li,
Zixun Sun,
Xinyu Xiao,
Chi Sun,
Hui Zhan,
Helen Meng
Abstract:
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but…
▽ More
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but also style irrelevant information. The information irrelevant to speaking style in the text could interfere the reference audio selection and result in improper speaking styles. To improve the reference selection, we propose Contrastive Acoustic-Linguistic Module (CALM) to extract the Style-related Text Feature (STF) from the text. CALM optimizes the correlation between the speaking style embedding and the extracted STF with contrastive learning. Thus, a certain number of the most appropriate reference speeches for the input text are selected by retrieving the speeches with the top STF similarities. Then the style embeddings are weighted summarized according to their STF similarities and used to stylize the synthesized speech of TTS. Experiment results demonstrate the effectiveness of our proposed approach, with both objective evaluations and subjective evaluations on the speaking styles of the synthesized speeches outperform a baseline approach with semantic-feature-based reference selection.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Understanding Turbo Codes: A Signal Processing Study
Authors:
Xiang-Gen Xia
Abstract:
In this paper, we study turbo codes from the digital signal processing point of view by defining turbo codes over the complex field. It is known that iterative decoding and interleaving between concatenated parallel codes are two key elements that make turbo codes perform significantly better than the conventional error control codes. This is analytically illustrated in this paper by showing that…
▽ More
In this paper, we study turbo codes from the digital signal processing point of view by defining turbo codes over the complex field. It is known that iterative decoding and interleaving between concatenated parallel codes are two key elements that make turbo codes perform significantly better than the conventional error control codes. This is analytically illustrated in this paper by showing that the decoded noise mean power in the iterative decoding decreases when the number of iterations increases, as long as the interleaving decorrelates the noise after each iterative decoding step. An analytic decreasing rate and the limit of the decoded noise mean power are given. The limit of the decoded noise mean power of the iterative decoding of a turbo code with two parallel codes with their rates less than 1/2 is one third of the noise power before the decoding, which can not be achieved by any non-turbo codes with the same rate. From this study, the role of designing a good interleaver can also be clearly seen.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Multiuser Communications with Movable-Antenna Base Station: Joint Antenna Positioning, Receive Combining, and Power Control
Authors:
Zhenyu Xiao,
Xiangyu Pi,
Lipeng Zhu,
Xiang-Gen Xia,
Rui Zhang
Abstract:
Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize th…
▽ More
Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation due to MAs' movements at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region for MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna position vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BSs can significantly improve the rate performance as compared to conventional BSs with fixed-position antennas (FPAs).
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation
Authors:
Lizhao Liu,
Zhuangwei Zhuang,
Shangxin Huang,
Xunlong Xiao,
Tianhang Xiang,
Cen Chen,
**gdong Wang,
Mingkui Tan
Abstract:
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked m…
▽ More
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked modeling (e.g., MAE) in image and video representation learning, we seek to endow the power of masked modeling to learn contextual information from sparsely-annotated points. However, directly applying MAE to 3D point clouds with sparse annotations may fail to work. First, it is nontrivial to effectively mask out the informative visual context from 3D point clouds. Second, how to fully exploit the sparse annotations for context modeling remains an open question. In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. Specifically, RegionMask masks the point cloud continuously in geometric space to construct a meaningful masked prediction task for subsequent context learning. CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively. Extensive experiments on the widely-tested ScanNet V2 and S3DIS benchmarks demonstrate the superiority of CPCM over the state-of-the-art.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Exploring the Potential of Integrated Optical Sensing and Communication (IOSAC) Systems with Si Waveguides for Future Networks
Authors:
Xiangpeng Ou,
Ying Qiu,
Ming Luo,
Fujun Sun,
Peng Zhang,
Gang Yang,
Junjie Li,
Jianfeng Gao,
Xiaobin He,
Anyan Du,
Bo Tang,
Bin Li,
Zichen Liu,
Zhihua Li,
Ling Xie,
Xi Xiao,
Jun Luo,
Wenwu Wang,
** Tao,
Yan Yang
Abstract:
Advanced silicon photonic technologies enable integrated optical sensing and communication (IOSAC) in real time for the emerging application requirements of simultaneous sensing and communication for next-generation networks. Here, we propose and demonstrate the IOSAC system on the silicon nitride (SiN) photonics platform. The IOSAC devices based on microring resonators are capable of monitoring t…
▽ More
Advanced silicon photonic technologies enable integrated optical sensing and communication (IOSAC) in real time for the emerging application requirements of simultaneous sensing and communication for next-generation networks. Here, we propose and demonstrate the IOSAC system on the silicon nitride (SiN) photonics platform. The IOSAC devices based on microring resonators are capable of monitoring the variation of analytes, transmitting the information to the terminal along with the modulated optical signal in real-time, and replacing bulk optics in high-precision and high-speed applications. By directly integrating SiN ring resonators with optical communication networks, simultaneous sensing and optical communication are demonstrated by an optical signal transmission experimental system using especially filtering amplified spontaneous emission spectra. The refractive index (RI) sensing ring with a sensitivity of 172 nm/RIU, a figure of merit (FOM) of 1220, and a detection limit (DL) of 8.2*10-6 RIU is demonstrated. Simultaneously, the 1.25 Gbps optical on-off-keying (OOK) signal is transmitted at the concentration of different NaCl solutions, which indicates the bit-error-ratio (BER) decreases with the increase in concentration. The novel IOSAC technology shows the potential to realize high-performance simultaneous biosensing and communication in real time and further accelerate the development of IoT and 6G networks.
△ Less
Submitted 27 June, 2023;
originally announced July 2023.
-
Decoding Taste Information in Human Brain: A Temporal and Spatial Reconstruction Data Augmentation Method Coupled with Taste EEG
Authors:
Xiuxin Xia,
Yuchao Yang,
Yan Shi,
Wenbo Zheng,
Hong Men
Abstract:
For humans, taste is essential for perceiving food's nutrient content or harmful components. The current sensory evaluation of taste mainly relies on artificial sensory evaluation and electronic tongue, but the former has strong subjectivity and poor repeatability, and the latter is not flexible enough. This work proposed a strategy for acquiring and recognizing taste electroencephalogram (EEG), a…
▽ More
For humans, taste is essential for perceiving food's nutrient content or harmful components. The current sensory evaluation of taste mainly relies on artificial sensory evaluation and electronic tongue, but the former has strong subjectivity and poor repeatability, and the latter is not flexible enough. This work proposed a strategy for acquiring and recognizing taste electroencephalogram (EEG), aiming to decode people's objective perception of taste through taste EEG. Firstly, according to the proposed experimental paradigm, the taste EEG of subjects under different taste stimulation was collected. Secondly, to avoid insufficient training of the model due to the small number of taste EEG samples, a Temporal and Spatial Reconstruction Data Augmentation (TSRDA) method was proposed, which effectively augmented the taste EEG by reconstructing the taste EEG's important features in temporal and spatial dimensions. Thirdly, a multi-view channel attention module was introduced into a designed convolutional neural network to extract the important features of the augmented taste EEG. The proposed method has accuracy of 99.56%, F1-score of 99.48%, and kappa of 99.38%, proving the method's ability to distinguish the taste EEG evoked by different taste stimuli successfully. In summary, combining TSRDA with taste EEG technology provides an objective and effective method for sensory evaluation of food taste.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval
Authors:
Pu Yang,
Xiang-Gen Xia,
Qingyue Qu,
Han Wang,
Yi Liu
Abstract:
In-band full-duplex relay (FDR) has attracted much attention as an effective solution to improve the coverage and spectral efficiency in wireless communication networks. The basic problem for FDR transmission is how to eliminate the inherent self-interference and re-use the residual self-interference (RSI) at the relay to improve the end-to-end performance. Considering the RSI at the FDR, the over…
▽ More
In-band full-duplex relay (FDR) has attracted much attention as an effective solution to improve the coverage and spectral efficiency in wireless communication networks. The basic problem for FDR transmission is how to eliminate the inherent self-interference and re-use the residual self-interference (RSI) at the relay to improve the end-to-end performance. Considering the RSI at the FDR, the overall equivalent channel can be modeled as an infinite impulse response (IIR) channel. For this IIR channel, a joint design for precoding, power gain control and equalization of cooperative OFDM relay systems is presented. Compared with the traditional OFDM systems, the length of the guard interval for the proposed design can be distinctly reduced, thereby improving the spectral efficiency. By analyzing the noise sources, this paper evaluates the signal to noise ratio (SNR) of the proposed scheme and presents a power gain control algorithm at the FDR. Compared with the existing schemes, the proposed scheme shows a superior bit error rate (BER) performance.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI
Authors:
Xiaojiao Xiao,
Qinmin Hu,
Guanghui Wang
Abstract:
Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper pro…
▽ More
Simultaneous multi-index quantification, segmentation, and uncertainty estimation of liver tumors on multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an effective mechanism for multi-modality NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-index quantification, segmentation, and uncertainty of liver tumors on the multi-modality NCMRI. The EaMtNet employs two parallel CNN encoders and the Sobel filters to extract local features and edge maps, respectively. The newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing long-range dependency between feature and edge maps. Multi-tasking leverages prediction discrepancy to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments are performed on multi-modality NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm$1.23 and a mean absolute error of 2.72$\pm$0.58 mm for MD. The results demonstrate the potential of EaMtNet as a reliable clinical-aided tool for medical image analysis.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Experts' cognition-driven ensemble deep learning for external validation of predicting pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer
Authors:
Yongquan Yang,
Fengling Li,
Yani Wei,
Yuanyuan Zhao,
**g Fu,
Xiuli Xiao,
Hong Bu
Abstract:
In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situat…
▽ More
In breast cancer imaging, there has been a trend to directly predict pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) from histological images based on deep learning (DL). However, it has been a commonly known problem that the constructed DL-based models numerically have better performances in internal validation than in external validation. The primary reason for this situation lies in that the distribution of the external data for validation is different from the distribution of the training data for the construction of the predictive model. In this paper, we aim to alleviate this situation with a more intrinsic approach. We propose an experts' cognition-driven ensemble deep learning (ECDEDL) approach for external validation of predicting pCR to NAC from histological images in breast cancer. The proposed ECDEDL, which takes the cognition of both pathology and artificial intelligence experts into consideration to improve the generalization of the predictive model to the external validation, more intrinsically approximates the working paradigm of a human being which will refer to his various working experiences to make decisions. The proposed ECDEDL approach was validated with 695 WSIs collected from the same center as the primary dataset to develop the predictive model and perform the internal validation, and 340 WSIs collected from other three centers as the external dataset to perform the external validation. In external validation, the proposed ECDEDL approach improves the AUCs of pCR prediction from 61.52(59.80-63.26) to 67.75(66.74-68.80) and the Accuracies of pCR prediction from 56.09(49.39-62.79) to 71.01(69.44-72.58). The proposed ECDEDL was quite effective for external validation, numerically more approximating the internal validation.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Authors:
Xiaohuai Le,
Tong Lei,
Li Chen,
Yiqing Guo,
Chao He,
Cheng Chen,
Xianjun Xia,
Hua Gao,
Yijian Xiao,
Piao Ding,
Shenyi Song,
**g Lu
Abstract:
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccura…
▽ More
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccurate fundamental frequency estimation. To tackle this problem, we propose a learnable comb filter to enhance harmonics. Based on the sub-band model, we design a DNN-based fundamental frequency estimator to estimate the discrete fundamental frequencies and a comb filter for harmonic enhancement, which are trained via an end-to-end pattern. The experiments show the advantages of our proposed method over PecepNet and DeepFilterNet.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
A Graph-Based Collision Resolution Scheme for Asynchronous Unsourced Random Access
Authors:
Tianya Li,
Yongpeng Wu,
Wenjun Zhang,
Xiang-Gen Xia,
Chengshan Xiao
Abstract:
This paper investigates the multiple-input-multiple-output (MIMO) massive unsourced random access in an asynchronous orthogonal frequency division multiplexing (OFDM) system, with both timing and frequency offsets (TFO) and non-negligible user collisions. The proposed coding framework splits the data into two parts encoded by sparse regression code (SPARC) and low-density parity check (LDPC) code.…
▽ More
This paper investigates the multiple-input-multiple-output (MIMO) massive unsourced random access in an asynchronous orthogonal frequency division multiplexing (OFDM) system, with both timing and frequency offsets (TFO) and non-negligible user collisions. The proposed coding framework splits the data into two parts encoded by sparse regression code (SPARC) and low-density parity check (LDPC) code. Multistage orthogonal pilots are transmitted in the first part to reduce collision density. Unlike existing schemes requiring a quantization codebook with a large size for estimating TFO, we establish a \textit{graph-based channel reconstruction and collision resolution (GB-CR$^2$)} algorithm to iteratively reconstruct channels, resolve collisions, and compensate for TFO rotations on the formulated graph jointly among multiple stages. We further propose to leverage the geometric characteristics of signal constellations to correct TFO estimations. Exhaustive simulations demonstrate remarkable performance superiority in channel estimation and data recovery with substantial complexity reduction compared to state-of-the-art schemes.
△ Less
Submitted 18 August, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
Authors:
Jan Silovsky,
Liuhui Deng,
Arturo Argueta,
Tresi Arvizo,
Roger Hsiao,
Sasha Kuznietsov,
Yiu-Chang Lin,
Xiaoqiang Xiao,
Yuanyuan Zhang
Abstract:
Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems.
Cross-lingual knowledge tran…
▽ More
Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems.
Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems, in particular for low-resource languages, like Ukrainian.
Our goal is to train an all-neural Transducer-based ASR system to replace a DNN-HMM hybrid system with no manually annotated training data. We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate. However, using a combination of cross-lingual knowledge transfer from related languages and iterative pseudo-labeling, we are able to achieve 35% reduction of the error rate.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Proactive Content Caching Scheme in Urban Vehicular Networks
Authors:
Biqian Feng,
Chenyuan Feng,
Daquan Feng,
Yongpeng Wu,
Xiang-Gen Xia
Abstract:
Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks.…
▽ More
Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks. To tackle this problem, this paper aims to design a novel edge-computing-enabled hierarchical cooperative caching framework. Firstly, we profoundly analyze the spatio-temporal correlation between the historical vehicle trajectory of user requests and construct the system model to predict the vehicle trajectory and content popularity, which lays a foundation for mobility-aware content caching and dispatching. Meanwhile, we probe into privacy protection strategies to realize privacy-preserved prediction model. Furthermore, based on trajectory and popular content prediction results, content caching strategy is studied, and adaptive and dynamic resource management schemes are proposed for hierarchical cooperative caching networks. Finally, simulations are provided to verify the superiority of our proposed scheme and algorithms. It shows that the proposed algorithms effectively improve the performance of the considered system in terms of hit ratio and average delay, and narrow the gap to the optimal caching scheme comparing with the traditional schemes.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Precoder Design for Massive MIMO Downlink with Matrix Manifold Optimization
Authors:
Rui Sun,
Chen Wang,
An-An Lu,
Xiqi Gao,
Xiang-Gen Xia
Abstract:
We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders u…
▽ More
We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders under TPC, PUPC and PAPC are on distinct Riemannian submanifolds, and transform the constrained problems in Euclidean space to unconstrained ones on manifolds. In accordance with this, we derive Riemannian ingredients, including orthogonal projection, Riemannian gradient, Riemannian Hessian, retraction and vector transport, which are needed for precoder design in the matrix manifold framework. Then, Riemannian design methods using Riemannian steepest descent, Riemannian conjugate gradient and Riemannian trust region are provided to design the WSR-maximization precoders under TPC, PUPC or PAPC. Riemannian methods do not involve the inverses of the large dimensional matrices during the iterations, reducing the computational complexities of the algorithms. Complexity analyses and performance simulations demonstrate the advantages of the proposed precoder design.
△ Less
Submitted 10 April, 2024; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Authors:
Mingshuai Liu,
Shubo Lv,
Zihan Zhang,
Runduo Han,
Xiang Hao,
Xianjun Xia,
Li Chen,
Yijian Xiao,
Lei Xie
Abstract:
In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise,…
▽ More
In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Personalized speech enhancement combining band-split RNN and speaker attentive module
Authors:
Xiaohuai Le,
Li Chen,
Chao He,
Yiqing Guo,
Cheng Chen,
Xianjun Xia,
**g Lu
Abstract:
Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate feature…
▽ More
Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate features, which are used to rescale the features. By merging this module in the state-of-the-art SE model, we construct the personalized SE model for ICASSP Signal Processing Grand Challenge: DNS Challenge 5 (2023). Our system achieves a final score of 0.529 on the blind test set of track1 and 0.549 on track2.
△ Less
Submitted 16 March, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Tensorized Optical Multimodal Fusion Network
Authors:
Yequan Zhao,
Xian Xiao,
Geza Kurczveil,
Raymond G. Beausoleil,
Zheng Zhang
Abstract:
We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion. Simulation results show $51.3 \times$ less hardware requirement and $3.7\times 10^{13}$ MAC/J energy efficiency.
We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion. Simulation results show $51.3 \times$ less hardware requirement and $3.7\times 10^{13}$ MAC/J energy efficiency.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Speaker Change Detection for Transformer Transducer ASR
Authors:
Jian Wu,
Zhuo Chen,
Min Hu,
Xiong Xiao,
**yu Li
Abstract:
Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points. Existing SCD solutions either require additional ensemble for the time based decisions and recognized word sequences, or implement a tight integration between ASR and SC…
▽ More
Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points. Existing SCD solutions either require additional ensemble for the time based decisions and recognized word sequences, or implement a tight integration between ASR and SCD, limiting the potential optimum performance for both tasks. To address these issues, we propose a novel framework for the SCD task, where an additional SCD module is built on top of an existing Transformer Transducer ASR (TT-ASR) network. Two variants of the SCD network are explored in this framework that naturally estimate speaker change probability for each word, while allowing the ASR and SCD to have independent optimization scheme for the best performance. Experiments show that our methods can significantly improve the F1 score on LibriCSS and Microsoft call center data sets without ASR degradation, compared with a joint SCD and ASR baseline.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
From ORAN to Cell-Free RAN: Architecture, Performance Analysis, Testbeds and Trials
Authors:
Yang Cao,
Ziyang Zhang,
Xinjiang Xia,
Pengzhe Xin,
Dongjie Liu,
Kang Zheng,
Mengting Lou,
**g **,
Qixing Wang,
Dongming Wang,
Yongming Huang,
Xiaohu You,
Jiangzhou Wang
Abstract:
Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communicati…
▽ More
Open radio access network (ORAN) provides an open architecture to implement radio access network (RAN) of the fifth generation (5G) and beyond mobile communications. As a key technology for the evolution to the sixth generation (6G) systems, cell-free massive multiple-input multiple-output (CF-mMIMO) can effectively improve the spectrum efficiency, peak rate and reliability of wireless communication systems. Starting from scalable implementation of CF-mMIMO, we study a cell-free RAN (CF-RAN) under the ORAN architecture. Through theoretical analysis and numerical simulation, we investigate the uplink and downlink spectral efficiencies of CF-mMIMO with the new architecture. We then discuss the implementation issues of CF-RAN under ORAN architecture, including time-frequency synchronization and over-the-air reciprocity calibration, low layer splitting, deployment of ORAN radio units (O-RU), artificial intelligent based user associations. Finally, we present some representative experimental results for the uplink distributed reception and downlink coherent joint transmission of CF-RAN with commercial off-the-shelf O-RUs.
△ Less
Submitted 6 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
A New OFDM System for IIR Channels
Authors:
Xiang-Gen Xia
Abstract:
In this paper, we propose a new OFDM system for an IIR channel with the form of $B(z)/A(z)$ for two polynomials $A(z)$ and $B(z)$. Different from the conventional OFDM transmission over an FIR channel, a guard interval of an OFDM symbol is added such that the corresponding part at receiver is the cyclic prefix (CP) of the received OFDM symbol. The guard interval and CP lengths are the same and not…
▽ More
In this paper, we propose a new OFDM system for an IIR channel with the form of $B(z)/A(z)$ for two polynomials $A(z)$ and $B(z)$. Different from the conventional OFDM transmission over an FIR channel, a guard interval of an OFDM symbol is added such that the corresponding part at receiver is the cyclic prefix (CP) of the received OFDM symbol. The guard interval and CP lengths are the same and not smaller than the orders of polynomials $A(z)$ and $B(z)$. The OFDM symbol without the guard interval is the same as the conventional OFDM symbol without the CP. At the receiver, the IIR channel is then converted to $N$ intersymbol interference (ISI) free subchannels, where $N$ is the number of subcarriers of an OFDM symbol.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Electromagnetic Environment Analysis of High-Power Wireless Charging Device
Authors:
Zhengyang Zhang,
Zhihui Liu,
Wen** Zhang,
Rui Zhang,
Xiang Xiao
Abstract:
Objective Aiming at the problems of many interference factors in the electromagnetic radiation simulation of electric vehicles, a field measurement scheme for charging devices is designed. Through the monitoring of wireless charging equipment, the radiation level and distribution of the electric field value and magnetic field value around the charging equipment is explored, to analyze the influenc…
▽ More
Objective Aiming at the problems of many interference factors in the electromagnetic radiation simulation of electric vehicles, a field measurement scheme for charging devices is designed. Through the monitoring of wireless charging equipment, the radiation level and distribution of the electric field value and magnetic field value around the charging equipment is explored, to analyze the influence law of the electromagnetic environment. Method This paper introduces the principle and development status of electric vehicle charging, and analyzes the classification and methods of wireless charging. In this paper, the electric field and magnetic field of cars and minibuses are monitored, at positions such as: around the body, attenuation section, inside the car and inside the minibus. Result The range of electric field strength is 0.9 V/m to 48.1 V/m for cars, and 0.8 V/m to 74.7 V/m for minibuses. The electric field strength decays rapidly with the increase of the distance from the vehicle body, and the law is obvious. The range of magnetic induction intensity is 0.12μT-12.70μT for cars, and 0.15μT-27.06μT for minibuses. The magnetic induction intensity decays rapidly with the increase of the distance from the vehicle body, and the law is obvious. Conclusion This paper explores the radiation level and distribution of the electromagnetic field around this type of charging equipment. It is recommended that manufacturers of wireless charging devices for electric vehicles strengthen research on electromagnetic radiation shielding and take corresponding measures to control the level of electromagnetic radiation in areas accessible to the public.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
3-D Positioning and Resource Allocation for Multi-UAV Base Stations Under Blockage-Aware Channel Model
Authors:
Pengfei Yi,
Lipeng Zhu,
Zhenyu Xiao,
Rui Zhang,
Zhu Han,
Xiang-Gen Xia
Abstract:
In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of…
▽ More
In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of the channels at different locations. Based on the proposed channel model, we formulate the joint optimization problem of UAV three-dimensional (3-D) positioning and resource allocation, by power allocation, user association, and subcarrier allocation, to maximize the minimum achievable rate among users. To solve this non-convex combinatorial programming problem, we introduce a penalty term to relax it and develop a suboptimal solution via a penalty-based double-loop iterative optimization framework. The inner loop solves the penalized problem by employing the block successive convex approximation (BSCA) technique, where the UAV positioning and resource allocation are alternately optimized in each iteration. The outer loop aims to obtain proper penalty multipliers to ensure the solution of the penalized problem converges to that of the original problem. Simulation results demonstrate the superiority of the proposed algorithm over other benchmark schemes in terms of the minimum achievable rate.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Functional Split of In-Network Deep Learning for 6G: A Feasibility Study
Authors:
Jia He,
Huanzhuo Wu,
Xun Xiao,
Riccardo Bassoli,
Frank H. P. Fitzek
Abstract:
In existing mobile network systems, the data plane (DP) is mainly considered a pipeline consisting of network elements end-to-end forwarding user data traffics. With the rapid maturity of programmable network devices, however, mobile network infrastructure mutates towards a programmable computing platform. Therefore, such a programmable DP can provide in-network computing capability for many appli…
▽ More
In existing mobile network systems, the data plane (DP) is mainly considered a pipeline consisting of network elements end-to-end forwarding user data traffics. With the rapid maturity of programmable network devices, however, mobile network infrastructure mutates towards a programmable computing platform. Therefore, such a programmable DP can provide in-network computing capability for many application services. In this paper, we target to enhance the data plane with in-network deep learning (DL) capability. However, in-network intelligence can be a significant load for network devices. Then, the paradigm of the functional split is applied so that the deep neural network (DNN) is decomposed into sub-elements of the data plane for making machine learning inference jobs more efficient. As a proof-of-concept, we take a Blind Source Separation (BSS) problem as an example to exhibit the benefits of such an approach. We implement the proposed enhancement in a full-stack emulator and we provide a quantitative evaluation with professional datasets. As an initial trial, our study provides insightful guidelines for the design of the future mobile network system, employing in-network intelligence (e.g., 6G).
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Authors:
Dongmei Wang,
Xiong Xiao,
Naoyuki Kanda,
Takuya Yoshioka,
Jian Wu
Abstract:
This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback of being unable to handle an arbitrary number of speakers, we investigate model architectures that use input tensors with variable-length time and speaker dimensions. Transformer layers are applied to the speaker axis to mak…
▽ More
This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback of being unable to handle an arbitrary number of speakers, we investigate model architectures that use input tensors with variable-length time and speaker dimensions. Transformer layers are applied to the speaker axis to make the model output insensitive to the order of the speaker profiles provided to the TS-VAD model. Time-wise sequential layers are interspersed between these speaker-wise transformer layers to allow the temporal and cross-speaker correlations of the input speech signal to be captured. We also extend a diarization model based on end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA) by replacing its dot-product-based speaker detection layer with the transformer-based TS-VAD. Experimental results on VoxConverse show that using the transformers for the cross-speaker modeling reduces the diarization error rate (DER) of TS-VAD by 11.3%, achieving a new state-of-the-art (SOTA) DER of 4.57%. Also, our extended EEND-EDA reduces DER by 6.9% on the CALLHOME dataset relative to the original EEND-EDA with a similar model size, achieving a new SOTA DER of 11.18% under a widely used training data setting.
△ Less
Submitted 25 September, 2022; v1 submitted 27 August, 2022;
originally announced August 2022.
-
Trajectory Tracking Control of the Bionic Joint Actuated by Pneumatic Artificial Muscle Based on Robust Modeling
Authors:
Yang Wang,
Qiang Zhang,
Xiao-hui Xiao
Abstract:
To simply and effectively realize the trajectory tracking control of a bionic joint actuated by a single pneumatic artificial muscle (PAM), a cascaded control strategy is proposed based on the robust modeling method. Firstly, the relationship between the input voltage of the proportional directional control valve and the inner driving pressure of PAM is expressed as a nonlinear model analytically.…
▽ More
To simply and effectively realize the trajectory tracking control of a bionic joint actuated by a single pneumatic artificial muscle (PAM), a cascaded control strategy is proposed based on the robust modeling method. Firstly, the relationship between the input voltage of the proportional directional control valve and the inner driving pressure of PAM is expressed as a nonlinear model analytically. Secondly, the nonlinear relationship between the driving pressure input of PAM and the angular position output of the bionic joint is described as a second-order linear time-invariant model (LTI) accompanied by parametric perturbations, equivalently, and then the parameters of the model are identified by the robust modeling method. Then, a hybrid model is established based on the two models (the nonlinear model and the LTI model) and corresponding to it, a cascaded controller is developed, the outer loop of which is an H-infinite controller for the angular position tracking designed by loop-sha** design procedure (LSDP) and the inner loop is a nonlinear controller based on the feedback linearization theory for the PAM driving pressure control. Finally, the experiment is accomplished within the joint rotation range of 90 degrees and with the working frequency upper bound of 1.25 rad/s. And the joint with the developed cascaded controller tracks given reference trajectories with steady-state errors smaller than 2%. Results show that the trajectory tracking control of a highly nonlinear system is highly efficient using the proposed strategy in the case of relatively low work frequency.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.