Search | arXiv e-print repository

A Comprehensive Overview and Survey of O-RAN: Exploring Slicing-aware Architecture, Deployment Options, and Use Cases

Authors: Khurshid Alam, Mohammad Asif Habibi, Matthias Tammen, Dennis Krummacker, Walid Saad, Marco Di Renzo, Tommaso Melodia, Xavier Costa-Pérez, Mérouane Debbah, Ashutosh Dutta, Hans D. Schotten

Abstract: Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhan… ▽ More Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhancement promises increased flexibility, performance optimization, service innovation, energy efficiency, and cost efficiency in fifth-generation (5G), sixth-generation (6G), and future networks. One of the key features of the O-RAN architecture is its support for network slicing, which entails interaction with other slicing domains within a mobile network, notably the transport network (TN) domain and the core network (CN) domain, to realize end-to-end (E2E) network slicing. The study of this feature requires exploring the stances and contributions of diverse standards development organizations (SDOs). In this context, we note that despite the ongoing industrial deployments and standardization efforts, the research and standardization communities have yet to comprehensively address network slicing in O-RAN. To address this gap, this survey paper provides a comprehensive exploration of network slicing in O-RAN through an in-depth review of specification documents from O-RAN Alliance and research papers from leading industry and academic institutions. The paper commences with an overview of the ongoing standardization efforts and open-source contributions associated with O-RAN, subsequently delving into the latest O-RAN architecture with an emphasis on its slicing aspects. Further, the paper explores deployment scenarios for network slicing within O-RAN, examining options for the deployment and orchestration of O-RAN and TN network slice subnets... △ Less

Submitted 8 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 45 pages, 12 figures, 4 tables, submitted to the IEEE for possible publication

arXiv:2402.03058 [pdf, other]

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers

Authors: Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo

Abstract: Although mask-based beamforming is a powerful speech enhancement approach, it often requires manual parameter tuning to handle moving speakers. Recently, this approach was augmented with an attention-based spatial covariance matrix aggregator (ASA) module, enabling accurate tracking of moving speakers without manual tuning. However, the deep neural network model used in this module is limited to s… ▽ More Although mask-based beamforming is a powerful speech enhancement approach, it often requires manual parameter tuning to handle moving speakers. Recently, this approach was augmented with an attention-based spatial covariance matrix aggregator (ASA) module, enabling accurate tracking of moving speakers without manual tuning. However, the deep neural network model used in this module is limited to specific microphone arrays, necessitating a different model for varying channel permutations, numbers, or geometries. To improve the robustness of the ASA module against such variations, in this paper we investigate three approaches: training with random channel configurations, employing the transform-average-concatenate method to process multi-channel input features, and utilizing robust input features. Our experiments on the CHiME-3 and DEMAND datasets show that these approaches enable the ASA-augmented beamformer to track moving speakers across different microphone arrays unseen in training. △ Less

Submitted 17 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: accepted at Interspeech 2024

arXiv:2205.13851 [pdf, other]

Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures

Authors: Ragini Sinha, Marvin Tammen, Christian Rollwage, Simon Doclo

Abstract: Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different… ▽ More Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: submitted to IWAENC 2022

arXiv:2205.09017 [pdf, other]

doi 10.1109/IWAENC53105.2022.9914710

Dictionary-Based Fusion of Contact and Acoustic Microphones for Wind Noise Reduction

Authors: Marvin Tammen, Xilin Li, Simon Doclo, Lalin Theverapperuma

Abstract: In mobile speech communication applications, wind noise can lead to a severe reduction of speech quality and intelligibility. Since the performance of speech enhancement algorithms using acoustic microphones tends to substantially degrade in extremely challenging scenarios, auxiliary sensors such as contact microphones can be used. Although contact microphones offer a much lower recorded wind nois… ▽ More In mobile speech communication applications, wind noise can lead to a severe reduction of speech quality and intelligibility. Since the performance of speech enhancement algorithms using acoustic microphones tends to substantially degrade in extremely challenging scenarios, auxiliary sensors such as contact microphones can be used. Although contact microphones offer a much lower recorded wind noise level, they come at the cost of speech distortion and additional noise components. Aiming at exploiting the advantages of acoustic and contact microphones for wind noise reduction, in this paper we propose to extend conventional single-microphone dictionary-based speech enhancement approaches by simultaneously modeling the acoustic and contact microphone signals. We propose to train a single speech dictionary and two noise dictionaries and use a relative transfer function to model the relationship between the speech components at the microphones. Simulation results show that the proposed approach yields improvements in both speech quality and intelligibility compared to several baseline approaches, most notably approaches using only the contact microphones or only the acoustic microphone. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: accepted at IWAENC 22

arXiv:2205.08983 [pdf, other]

doi 10.1109/IWAENC53105.2022.9914742

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

Authors: Marvin Tammen, Simon Doclo

Abstract: To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech… ▽ More To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech and the noise components. Furthermore, for single-microphone scenarios, multi-frame algorithms such as the multi-frame MVDR (MFMVDR) filter have been proposed, which exploit temporal instead of spatial correlations. In this contribution, we propose a binaural extension of the MFMVDR filter, which exploits both spatial and temporal correlations. The binaural MFMVDR filters are embedded in an end-to-end deep learning framework, where the required parameters, i.e., the speech spatio-temporal correlation vectors as well as the (inverse) noise spatio-temporal covariance matrix, are estimated by temporal convolutional networks (TCNs) that are trained by minimizing the mean spectral absolute error loss function. Simulation results comprising measured binaural room impulses and diverse noise sources at signal-to-noise ratios from -5 dB to 20 dB demonstrate the advantage of utilizing the binaural MFMVDR filter structure over directly estimating the binaural multi-frame filter coefficients with TCNs. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: accepted at IWAENC 2022

arXiv:2106.01902 [pdf, ps, other]

Joint Multi-Channel Dereverberation and Noise Reduction Using a Unified Convolutional Beamformer With Sparse Priors

Authors: Henri Gode, Marvin Tammen, Simon Doclo

Abstract: Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech… ▽ More Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics. △ Less

Submitted 13 March, 2023; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: ITG Conference on Speech Communication

arXiv:2104.04234 [pdf, other]

Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells

Authors: Ragini Sinha, Marvin Tammen, Christian Rollwage, Simon Doclo

Abstract: Speaker-conditioned target speaker extraction systems rely on auxiliary information about the target speaker to extract the target speaker signal from a mixture of multiple speakers. Typically, a deep neural network is applied to isolate the relevant target speaker characteristics. In this paper, we focus on a single-channel target speaker extraction system based on a CNN-LSTM separator network an… ▽ More Speaker-conditioned target speaker extraction systems rely on auxiliary information about the target speaker to extract the target speaker signal from a mixture of multiple speakers. Typically, a deep neural network is applied to isolate the relevant target speaker characteristics. In this paper, we focus on a single-channel target speaker extraction system based on a CNN-LSTM separator network and a speaker embedder network requiring reference speech of the target speaker. In the LSTM layer of the separator network, we propose to customize the LSTM cells in order to only remember the specific voice patterns corresponding to the target speaker by modifying the information processing in the forget gate. Experimental results for two-speaker mixtures using the Librispeech dataset show that this customization significantly improves the target speaker extraction performance compared to using standard LSTM cells. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2011.10345 [pdf, other]

doi 10.1109/ICASSP39728.2021.9413775

Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Authors: Marvin Tammen, Simon Doclo

Abstract: Multi-frame algorithms for single-microphone speech enhancement, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain. Provided that accurate estimates of the required speech interframe correlation vector and the noise correlation matrix are available, it… ▽ More Multi-frame algorithms for single-microphone speech enhancement, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain. Provided that accurate estimates of the required speech interframe correlation vector and the noise correlation matrix are available, it has been shown that the MFMVDR filter yields a substantial noise reduction while hardly introducing any speech distortion. Aiming at merging the speech enhancement potential of the MFMVDR filter and the estimation capability of temporal convolutional networks (TCNs), in this paper we propose to embed the MFMVDR filter within a deep learning framework. The TCNs are trained to map the noisy speech STFT coefficients to the required quantities by minimizing the scale-invariant signal-to-distortion ratio loss function at the MFMVDR filter output. Experimental results show that the proposed deep MFMVDR filter achieves a competitive speech enhancement performance on the Deep Noise Suppression Challenge dataset. In particular, the results show that estimating the parameters of an MFMVDR filter yields a higher performance in terms of PESQ and STOI than directly estimating the multi-frame filter or single-frame masks and than Conv-TasNet. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: submitted to the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Ontario, Canada

arXiv:1905.08492 [pdf, other]

doi 10.1109/ICASSP40776.2020.9054196

DNN-Based Speech Presence Probability Estimation for Multi-Frame Single-Microphone Speech Enhancement

Authors: Marvin Tammen, Dörte Fischer, Bernd T. Meyer, Simon Doclo

Abstract: Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-power-distortionless-response (MFMPDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, pro… ▽ More Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-power-distortionless-response (MFMPDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation (IFC) vector is available. Typical estimation procedures of the IFC vector require an estimate of the speech presence probability (SPP) in each time-frequency (TF) bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate the SPP for each TF bin. Aiming at achieving a robust performance, the DNN is trained for various noise types and within a large signal-to-noise-ratio range. Experimental results show that the MFMPDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based SPP estimator. Furthermore, it is confirmed that exploiting interframe correlations in the MFMPDR is beneficial when compared to the Wiener gain especially in adverse scenarios. △ Less

Submitted 14 November, 2022; v1 submitted 21 May, 2019; originally announced May 2019.

arXiv:1804.06196 [pdf, other]

Demystifying Deception Technology:A Survey

Authors: Daniel Fraunholz, Simon Duque Anton, Christoph Lipps, Daniel Reti, Daniel Krohmer, Frederic Pohl, Matthias Tammen, Hans Dieter Schotten

Abstract: Deception boosts security for systems and components by denial, deceit, misinformation, camouflage and obfuscation. In this work an extensive overview of the deception technology environment is presented. Taxonomies, theoretical backgrounds, psychological aspects as well as concepts, implementations, legal aspects and ethics are discussed and compared. Deception boosts security for systems and components by denial, deceit, misinformation, camouflage and obfuscation. In this work an extensive overview of the deception technology environment is presented. Taxonomies, theoretical backgrounds, psychological aspects as well as concepts, implementations, legal aspects and ethics are discussed and compared. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Comments: 25 pages, 169 references

Showing 1–10 of 10 results for author: Tammen, M