Search | arXiv e-print repository

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions and find that they add very little value for the enhancement task. Our constraints are designed around on-device real-time speech enhancement -- model is causal, the compute footprint is small. Additionally, we focus on low SNR conditions where such models struggle to provide good enhancement. In order to systematically examine how SSL representations impact performance of such enhancement models, we propose a variety of techniques to utilize these embeddings which include different forms of knowledge-distillation and pre-training. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 8 pages; Shorter form accepted in ICASSP 2024

arXiv:2211.07798 [pdf, other]

A Uniform Sampling Procedure for Abstract Triangulations of Surfaces

Authors: Rajan Shankar, Jonathan Spreer

Abstract: We present a procedure to sample uniformly from the set of combinatorial isomorphism types of balanced triangulations of surfaces - also known as graph-encoded surfaces. For a given number $n$, the sample is a weighted set of graph-encoded surfaces with $2n$ triangles. The sampling procedure relies on connections between graph-encoded surfaces and permutations, and basic properties of the symmet… ▽ More We present a procedure to sample uniformly from the set of combinatorial isomorphism types of balanced triangulations of surfaces - also known as graph-encoded surfaces. For a given number $n$, the sample is a weighted set of graph-encoded surfaces with $2n$ triangles. The sampling procedure relies on connections between graph-encoded surfaces and permutations, and basic properties of the symmetric group. We implement our method and present a number of experimental findings based on the analysis of $138$ million runs of our sampling procedure, producing graph-encoded surfaces with up to $280$ triangles. Namely, we determine that, for $n$ fixed, the empirical mean genus $\bar{g}(n)$ of our sample is very close to $\bar{g}(n) = \frac{n-1}{2} - (16.98n -110.61)^{1/4}$. Moreover, we present experimental evidence that the associated genus distribution more and more concentrates on a vanishing portion of all possible genera as $n$ tends to infinity. Finally, we observe from our data that the mean number of non-trivial symmetries of a uniformly chosen graph encoding of a surface decays to zero at a rate super-exponential in $n$. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: 12 pages, 17 figures

MSC Class: 57Q15; 57N05; 20B30; 05C15; 05C80

Journal ref: This paper will be published in the proceedings of the SIAM Symposium on Algorithm Engineering and Experiments (ALENEX) 2023

arXiv:2211.05071 [pdf, other]

A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

Authors: Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman

Abstract: This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function introduces a Kullback Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sam… ▽ More This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function introduces a Kullback Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sample wise generation. By using a variational approximation to this stochastic loss function, we show that our KL divergence term can be implemented via a paired density discriminator. We term this new architecture a variational CycleGAN (VCGAN). Second, we model the prosodic features of target emotion as a smooth and learnable deformation of the source prosodic features. This approach provides implicit regularization that offers key advantages in terms of better range alignment to unseen and out of distribution speakers. We conduct rigorous experiments and comparative studies to demonstrate that our proposed framework is fairly robust with high performance against several state-of-the-art baselines. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Accepted in IEEE Transactions on Audio, Speech and Language Processing

arXiv:2211.05047 [pdf, other]

A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

Authors: Ravi Shankar, Abdouh Harouna Kenfack, Arjun Somayazulu, Archana Venkataraman

Abstract: Automated emotion recognition in speech is a long-standing problem. While early work on emotion recognition relied on hand-crafted features and simple classifiers, the field has now embraced end-to-end feature learning and classification using deep neural networks. In parallel to these models, researchers have proposed several data augmentation techniques to increase the size and variability of ex… ▽ More Automated emotion recognition in speech is a long-standing problem. While early work on emotion recognition relied on hand-crafted features and simple classifiers, the field has now embraced end-to-end feature learning and classification using deep neural networks. In parallel to these models, researchers have proposed several data augmentation techniques to increase the size and variability of existing labeled datasets. Despite many seminal contributions in the field, we still have a poor understanding of the interplay between the network architecture and the choice of data augmentation. Moreover, only a handful of studies demonstrate the generalizability of a particular model across multiple datasets, which is a prerequisite for robust real-world performance. In this paper, we conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition. To eliminate bias, we fix the model architectures and optimization hyperparameters using the VESUS dataset and then use repeated 5-fold cross validation to evaluate the performance on the IEMOCAP and CREMA-D datasets. Our results demonstrate that long-range dependencies in the speech signal are critical for emotion recognition and that speed/rate augmentation offers the most robust performance gain across models. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Under Submission

arXiv:2207.02157 [pdf, other]

Multi-IRS-Aided Doppler-Tolerant Wideband DFRC System

Authors: Tong Wei, Linlong Wu, Kumar Vijay Mishra, M. R. Bhavani Shankar

Abstract: Intelligent reflecting surface (IRS) is recognized as an enabler of future dual-function radar-communications (DFRC) by improving spectral efficiency, coverage, parameter estimation, and interference suppression. Prior studies on IRS-aided DFRC focus either on narrowband processing, single-IRS deployment, static targets, non-clutter scenario, or on the under-utilized line-of-sight (LoS) and non-li… ▽ More Intelligent reflecting surface (IRS) is recognized as an enabler of future dual-function radar-communications (DFRC) by improving spectral efficiency, coverage, parameter estimation, and interference suppression. Prior studies on IRS-aided DFRC focus either on narrowband processing, single-IRS deployment, static targets, non-clutter scenario, or on the under-utilized line-of-sight (LoS) and non-line-of-sight (NLoS) paths. In this paper, we address the aforementioned shortcomings by optimizing a wideband DFRC system comprising multiple IRSs and a dual-function base station that jointly processes the LoS and NLoS wideband multi-carrier signals to improve both the communications SINR and the radar SINR in the presence of a moving target and clutter. We formulate the transmit, {receive} and IRS beamformer design as the maximization of the worst-case radar signal-to-interference-plus-noise ratio (SINR) subject to transmit power and communications SINR. We tackle this nonconvex problem under the alternating optimization framework, where the subproblems are solved by a combination of Dinkelbach algorithm, consensus alternating direction method of multipliers, and Riemannian steepest decent. Our numerical experiments show that the proposed multi-IRS-aided wideband DFRC provides over $4$ dB radar SINR and $31.7$\% improvement in target detection over a single-IRS system. △ Less

Submitted 10 August, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: 16 pages, 8 figures, 2 tables

arXiv:2205.15952 [pdf, other]

Knowledge Graph - Deep Learning: A Case Study in Question Answering in Aviation Safety Domain

Authors: Ankush Agarwal, Raj Gite, Shreya Laddha, Pushpak Bhattacharyya, Satyanarayan Kar, Asif Ekbal, Prabhjit Thind, Rajesh Zele, Ravi Shankar

Abstract: In the commercial aviation domain, there are a large number of documents, like, accident reports (NTSB, ASRS) and regulatory directives (ADs). There is a need for a system to access these diverse repositories efficiently in order to service needs in the aviation industry, like maintenance, compliance, and safety. In this paper, we propose a Knowledge Graph (KG) guided Deep Learning (DL) based Ques… ▽ More In the commercial aviation domain, there are a large number of documents, like, accident reports (NTSB, ASRS) and regulatory directives (ADs). There is a need for a system to access these diverse repositories efficiently in order to service needs in the aviation industry, like maintenance, compliance, and safety. In this paper, we propose a Knowledge Graph (KG) guided Deep Learning (DL) based Question Answering (QA) system for aviation safety. We construct a Knowledge Graph from Aircraft Accident reports and contribute this resource to the community of researchers. The efficacy of this resource is tested and proved by the aforesaid QA system. Natural Language Queries constructed from the documents mentioned above are converted into SPARQL (the interface language of the RDF graph database) queries and answered. On the DL side, we have two different QA models: (i) BERT QA which is a pipeline of Passage Retrieval (Sentence-BERT based) and Question Answering (BERT based), and (ii) the recently released GPT-3. We evaluate our system on a set of queries created from the accident reports. Our combined QA system achieves 9.3% increase in accuracy over GPT-3 and 40.3% increase over BERT QA. Thus, we infer that KG-DL performs better than either singly. △ Less

Submitted 9 June, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: LREC 2022 Main Conference Accepted Paper

arXiv:2204.07265 [pdf, other]

doi 10.1109/MNET.128.2200446

The Rise of Intelligent Reflecting Surfaces in Integrated Sensing and Communications Paradigms

Authors: Ahmet M. Elbir, Kumar Vijay Mishra, M. R. Bhavani Shankar, Symeon Chatzinotas

Abstract: The intelligent reflecting surface (IRS) alters the behavior of wireless media and, consequently, has potential to improve the performance and reliability of wireless systems such as communications and radar remote sensing. Recently, integrated sensing and communications (ISAC) has been widely studied as a means to efficiently utilize spectrum and thereby save cost and power. This article investig… ▽ More The intelligent reflecting surface (IRS) alters the behavior of wireless media and, consequently, has potential to improve the performance and reliability of wireless systems such as communications and radar remote sensing. Recently, integrated sensing and communications (ISAC) has been widely studied as a means to efficiently utilize spectrum and thereby save cost and power. This article investigates the role of IRS in the future ISAC paradigms. While there is a rich heritage of recent research into IRS-assisted communications, the IRS-assisted radars and ISAC remain relatively unexamined. We discuss the putative advantages of IRS deployment, such as coverage extension, interference suppression, and enhanced parameter estimation, for both communications and radar. We introduce possible IRS-assisted ISAC scenarios with common and dedicated surfaces. The article provides an overview of related signal processing techniques and the design challenges, such as wireless channel acquisition, waveform design, and security. △ Less

Submitted 20 December, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted paper in IEEE Network Magazine

Journal ref: IEEE Network, 2023

arXiv:2202.12014 [pdf, other]

TriggerCit: Early Flood Alerting using Twitter and Geolocation -- a comparison with alternative sources

Authors: Carlo Bono, Barbara Pernici, Jose Luis Fernandez-Marquez, Amudha Ravi Shankar, Mehmet Oğuz Mülâyim, Edoardo Nemni

Abstract: Rapid impact assessment in the immediate aftermath of a natural disaster is essential to provide adequate information to international organisations, local authorities, and first responders. Social media can support emergency response with evidence-based content posted by citizens and organisations during ongoing events. In the paper, we propose TriggerCit: an early flood alerting tool with a mult… ▽ More Rapid impact assessment in the immediate aftermath of a natural disaster is essential to provide adequate information to international organisations, local authorities, and first responders. Social media can support emergency response with evidence-based content posted by citizens and organisations during ongoing events. In the paper, we propose TriggerCit: an early flood alerting tool with a multilanguage approach focused on timeliness and geolocation. The paper focuses on assessing the reliability of the approach as a triggering system, comparing it with alternative sources for alerts, and evaluating the quality and amount of complementary information gathered. Geolocated visual evidence extracted from Twitter by TriggerCit was analysed in two case studies on floods in Thailand and Nepal in 2021. △ Less

Submitted 5 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: 12 pages Keywords Social Media, Disaster management, Early Alerting

arXiv:2107.04973 [pdf, other]

A Deep-Bayesian Framework for Adaptive Speech Duration Modification

Authors: Ravi Shankar, Archana Venkataraman

Abstract: We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder network to produce this attention map via a stochastic version of the mean absolute error loss function; our model also predicts the lengt… ▽ More We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder network to produce this attention map via a stochastic version of the mean absolute error loss function; our model also predicts the length of the target speech signal using the encoder embeddings. The predicted length determines the number of steps for the decoder operation. During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal. Using this similarity matrix, we compute a war** path of alignment between the two signals. Our experiments demonstrate that this adaptive framework produces similar results to dynamic time war**, which relies on a known target signal, on both voice conversion and emotion conversion tasks. We also show that our technique results in a high quality of generated speech that is on par with state-of-the-art vocoders. △ Less

Submitted 11 July, 2021; originally announced July 2021.

Comments: 6 pages, 7 figures

arXiv:2106.15764 [pdf, other]

The Threat of Offensive AI to Organizations

Authors: Yisroel Mirsky, Ambra Demontis, Jaidip Kotak, Ram Shankar, Deng Gelei, Liu Yang, Xiangyu Zhang, Wenke Lee, Yuval Elovici, Battista Biggio

Abstract: AI has provided us with the ability to automate tasks, extract information from vast amounts of data, and synthesize media that is nearly indistinguishable from the real thing. However, positive tools can also be used for negative purposes. In particular, cyber adversaries can use AI (such as machine learning) to enhance their attacks and expand their campaigns. Although offensive AI has been di… ▽ More AI has provided us with the ability to automate tasks, extract information from vast amounts of data, and synthesize media that is nearly indistinguishable from the real thing. However, positive tools can also be used for negative purposes. In particular, cyber adversaries can use AI (such as machine learning) to enhance their attacks and expand their campaigns. Although offensive AI has been discussed in the past, there is a need to analyze and understand the threat in the context of organizations. For example, how does an AI-capable adversary impact the cyber kill chain? Does AI benefit the attacker more than the defender? What are the most significant AI threats facing organizations today and what will be their impact on the future? In this survey, we explore the threat of offensive AI on organizations. First, we present the background and discuss how AI changes the adversary's methods, strategies, goals, and overall attack model. Then, through a literature review, we identify 33 offensive AI capabilities which adversaries can use to enhance their attacks. Finally, through a user study spanning industry and academia, we rank the AI threats and provide insights on the adversaries. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2010.03021 [pdf, other]

Image-based Social Sensing: Combining AI and the Crowd to Mine Policy-Adherence Indicators from Twitter

Authors: Virginia Negri, Dario Scuratti, Stefano Agresti, Donya Rooein, Gabriele Scalia, Amudha Ravi Shankar, Jose Luis Fernandez Marquez, Mark James Carman, Barbara Pernici

Abstract: Social Media provides a trove of information that, if aggregated and analysed appropriately can provide important statistical indicators to policy makers. In some situations these indicators are not available through other mechanisms. For example, given the ongoing COVID-19 outbreak, it is essential for governments to have access to reliable data on policy-adherence with regards to mask wearing, s… ▽ More Social Media provides a trove of information that, if aggregated and analysed appropriately can provide important statistical indicators to policy makers. In some situations these indicators are not available through other mechanisms. For example, given the ongoing COVID-19 outbreak, it is essential for governments to have access to reliable data on policy-adherence with regards to mask wearing, social distancing, and other hard-to-measure quantities. In this paper we investigate whether it is possible to obtain such data by aggregating information from images posted to social media. The paper presents VisualCit, a pipeline for image-based social sensing combining recent advances in image recognition technology with geocoding and crowdsourcing techniques. Our aim is to discover in which countries, and to what extent, people are following COVID-19 related policy directives. We compared the results with the indicators produced within the CovidDataHub behavior tracker initiative. Preliminary results shows that social media images can produce reliable indicators for policy makers. △ Less

Submitted 5 March, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

Comments: 10 pages, 9 figures, to be published in Proceedings of ICSE Software Engineering in Society, May 2021

arXiv:2007.15108 [pdf, other]

doi 10.1109/TSP.2021.3072834

Localization with One-Bit Passive Radars in Narrowband Internet-of-Things using Multivariate Polynomial Optimization

Authors: Saeid Sedighi, Kumar Vijay Mishra, M. R. Bhavani Shankar, Björn Ottersten

Abstract: Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links… ▽ More Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links. We address both of these NB-IoT drawbacks in the framework of passive sensing devices that receive signals from the target-of-interest. We consider the limiting case where each node receiver employs one-bit analog-to-digital-converters and propose a novel low-complexity nodal delay estimation method using constrained-weighted least squares minimization. To support the low-capacity links to the fusion center (FC), the range estimates obtained at individual sensors are then converted to one-bit data. At the FC, we propose target localization with the aggregated one-bit range vector using both optimal and sub-optimal techniques. The computationally expensive former approach is based on Lasserre's method for multivariate polynomial optimization while the latter employs our less complex iterative joint r\textit{an}ge-\textit{tar}get location \textit{es}timation (ANTARES) algorithm. Our overall one-bit framework not only complements the low NB-IoT bandwidth but also supports the design goal of inexpensive NB-IoT location sensing. Numerical experiments demonstrate feasibility of the proposed one-bit approach with a $0.6$\% increase in the normalized localization error for the small set of $20$-$60$ nodes over the full-precision case. When the number of nodes is sufficiently large ($>80$), the one-bit methods yield the same performance as the full precision. △ Less

Submitted 9 April, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: 16 pages, 11 figures

arXiv:2007.12937 [pdf, other]

Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network

Authors: Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman

Abstract: We propose a novel method for emotion conversion in speech based on a chained encoder-decoder-predictor neural network architecture. The encoder constructs a latent embedding of the fundamental frequency (F0) contour and the spectrum, which we regularize using the Large Diffeomorphic Metric Map** (LDDMM) registration framework. The decoder uses this embedding to predict the modified F0 contour i… ▽ More We propose a novel method for emotion conversion in speech based on a chained encoder-decoder-predictor neural network architecture. The encoder constructs a latent embedding of the fundamental frequency (F0) contour and the spectrum, which we regularize using the Large Diffeomorphic Metric Map** (LDDMM) registration framework. The decoder uses this embedding to predict the modified F0 contour in a target emotional class. Finally, the predictor uses the original spectrum and the modified F0 contour to generate a corresponding target spectrum. Our joint objective function simultaneously optimizes the parameters of three model blocks. We show that our method outperforms the existing state-of-the-art approaches on both, the saliency of emotion conversion and the quality of resynthesized speech. In addition, the LDDMM regularization allows our model to convert phrases that were not present in training, thus providing evidence for out-of-sample generalization. △ Less

Submitted 10 August, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: Paper Accepted in Interspeech 2020

arXiv:2007.12932 [pdf, other]

Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

Authors: Ravi Shankar, Jacob Sager, Archana Venkataraman

Abstract: We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired e… ▽ More We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired emotion conversion (e.g., A to B) or to its inverse (B to A). We will show that this setup, which we refer to as a variational cycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergence between the source features and their cyclic counterpart. In addition, our generator combines a trainable deep network with a fixed generative block to implement a smooth and invertible transformation on the input features, in our case, the fundamental frequency (F0) contour. This hybrid architecture regularizes our adversarial training procedure. We use crowd sourcing to evaluate both the emotional saliency and the quality of synthesized speech. Finally, we show that our model generalizes to new speakers by modifying speech produced by Wavenet. △ Less

Submitted 10 August, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: Paper accepted in Interspeech 2020

arXiv:2001.01406 [pdf]

Analysis of Selective-Decode and Forward Relaying Protocol Over kappa-mu Fading Channel Distribution

Authors: Ravi Shankar, Lokesh Bhardwaj, Ritesh Kumar Mishra

Abstract: In this work, we examine the performance of selective-decode and forward (S-DF) relay systems over kappa-mu fading channel condition. We discuss about the probability density function (PDF), system model, and cumulative distribution function (CDF) of kappa-mu distributed envelope and signal to noise ratio (SNR) and the techniques to generate samples that follow kappa-mu distribution. Specifically,… ▽ More In this work, we examine the performance of selective-decode and forward (S-DF) relay systems over kappa-mu fading channel condition. We discuss about the probability density function (PDF), system model, and cumulative distribution function (CDF) of kappa-mu distributed envelope and signal to noise ratio (SNR) and the techniques to generate samples that follow kappa-mu distribution. Specifically, we consider the case where the source-to-relay (SR), relay-to-destination (RD) and source-to-destination (SD) link is subject to the independent and identically distributed (i.i.d.) kappa-mu fading. From the simulation results, the enhancement in the symbol error rate (SER) with a stronger line of sight (LOS) component is observed. This shows that S-DF relaying systems can perform well even in the non-fading or LOS conditions. Monte Carlo simulations are conducted for various values of fading parameters and the outcomes closely match with theoretical outcomes which validate the derivations. △ Less

Submitted 6 January, 2020; originally announced January 2020.

arXiv:1912.10036 [pdf, other]

A Family of Deep Learning Architectures for Channel Estimation and Hybrid Beamforming in Multi-Carrier mm-Wave Massive MIMO

Authors: Ahmet M. Elbir, Kumar Vijay Mishra, M. R. Bhavani Shankar, Björn Ottersten

Abstract: Hybrid analog and digital beamforming transceivers are instrumental in addressing the challenge of expensive hardware and high training overheads in the next generation millimeter-wave (mm-Wave) massive MIMO (multiple-input multiple-output) systems. However, lack of fully digital beamforming in hybrid architectures and short coherence times at mm-Wave impose additional constraints on the channel e… ▽ More Hybrid analog and digital beamforming transceivers are instrumental in addressing the challenge of expensive hardware and high training overheads in the next generation millimeter-wave (mm-Wave) massive MIMO (multiple-input multiple-output) systems. However, lack of fully digital beamforming in hybrid architectures and short coherence times at mm-Wave impose additional constraints on the channel estimation. Prior works on addressing these challenges have focused largely on narrowband channels wherein optimization-based or greedy algorithms were employed to derive hybrid beamformers. In this paper, we introduce a deep learning (DL) approach for channel estimation and hybrid beamforming for frequency-selective, wideband mm-Wave systems. In particular, we consider a massive MIMO Orthogonal Frequency Division Multiplexing (MIMO-OFDM) system and propose three different DL frameworks comprising convolutional neural networks (CNNs), which accept the raw data of received signal as input and yield channel estimates and the hybrid beamformers at the output. We also introduce both offline and online prediction schemes. Numerical experiments demonstrate that, compared to the current state-of-the-art optimization and DL methods, our approach provides higher spectral efficiency, lesser computational cost and fewer number of pilot signals, and higher tolerance against the deviations in the received pilot data, corrupted channel matrix, and propagation environment. △ Less

Submitted 3 January, 2022; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: Accepted Paper in IEEE Transactions on Cognitive Communications and Networking. arXiv admin note: text overlap with arXiv:1910.14240

arXiv:1811.09850 [pdf]

Outage Probability Analysis of Selective-Decode and Forward Cooperative Wireless Network over Time Varying Fading Channels with Node Mobility and Imperfect CSI Condition

Authors: Ravi Shankar, Ritesh Kumar Mishra

Abstract: In this work, we explore the outage probability (OP) analysis of selective decode and forward (SDF) cooperation protocol employing multiple-input multipleoutput (MIMO) orthogonal space-time block-code (OSTBC) over time varying Rayleigh fading channel conditions with imperfect channel state information (CSI) and mobile nodes. The closed-form expressions of the per-block average OP, probability dist… ▽ More In this work, we explore the outage probability (OP) analysis of selective decode and forward (SDF) cooperation protocol employing multiple-input multipleoutput (MIMO) orthogonal space-time block-code (OSTBC) over time varying Rayleigh fading channel conditions with imperfect channel state information (CSI) and mobile nodes. The closed-form expressions of the per-block average OP, probability distribution function (PDF) of sum of independent and identically distributed (i.i.d.) Gamma random variables (RVs), and cumulative distribution function (CDF) are derived and used to investigate the performance of the relaying network. A mathematical framework is developed to derive the optimal source-relay power allocation factors. It is shown that source node mobility affects the per-block average OP performance more significantly than the destination node mobility. Nevertheless, in other node mobility situations, cooperative systems are constrained by an error floor with a higher signal to noise ratio (SNR) regimes. Simulation results show that the equal power allocation is the only possible optimal solution when source to relay link is stronger than the relay to destination link. Also, we allocate almost all the power to the source node when source to relay link is weaker than the relay to destination link. Simulation results also show that OP simulated plots are in close agreement with the OP analytic plots at high SNR regimes. △ Less

Submitted 24 November, 2018; originally announced November 2018.

arXiv:1809.00654 [pdf]

PEP Analysis of Selective Decode and Forward Protocol over Keyhole Fading

Authors: Ravi Shankar, Yamini Chandrakar, Radhika Sinha, Ritesh Kumar Mishra

Abstract: We provide a closed form upper bound formulation for the average pairwise-error probability (PEP) of selective decode and forward (SDF) cooperation protocol for a keyhole (pinhole) channel condition. We have employed orthogonal space-time block-code scheme (OSTBC) in conjunction with multi-antenna (MIMO) technology. We have used moment generating function (MGF) based approach for deriving the uppe… ▽ More We provide a closed form upper bound formulation for the average pairwise-error probability (PEP) of selective decode and forward (SDF) cooperation protocol for a keyhole (pinhole) channel condition. We have employed orthogonal space-time block-code scheme (OSTBC) in conjunction with multi-antenna (MIMO) technology. We have used moment generating function (MGF) based approach for deriving the upper bound of PEP. PEP expression provides information regarding the performance of the wireless system with respect to the channel conditions. We have included simulation results which confirm the analytical results of our proposed upper bound. Simulation results show that due to keyhole effect performance of wireless system degrades. △ Less

Submitted 10 September, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: MICRO 2017

arXiv:1802.06270 [pdf, other]

MAVIS: Managing Datacenters using Smartphones

Authors: Raghav Shankar, Benjamin Kobin, Saurabh Bagchi, Michael Kistler, Jan Rellermeyer

Abstract: Distributed monitoring plays a crucial role in managing the activities of cloud-based datacenters. System administrators have long relied on monitoring systems such as Nagios and Ganglia to obtain status alerts on their desktop-class machines. However, the popularity of mobile devices is pushing the community to develop datacenter monitoring solutions for smartphone-class devices. Here we lay out… ▽ More Distributed monitoring plays a crucial role in managing the activities of cloud-based datacenters. System administrators have long relied on monitoring systems such as Nagios and Ganglia to obtain status alerts on their desktop-class machines. However, the popularity of mobile devices is pushing the community to develop datacenter monitoring solutions for smartphone-class devices. Here we lay out desirable characteristics of such smartphone-based monitoring and identify quantitatively the shortcomings from directly applying existing solutions to this domain. Then we introduce a possible design that addresses some of these shortcomings and provide results from an early prototype, called MAVIS, using one month of monitoring data from approximately 3,000 machines hosted by Purdue's central IT organization. △ Less

Submitted 17 February, 2018; originally announced February 2018.

Comments: ACM Classification (2012): Data center networks; System management; Ubiquitous and mobile computing systems and tools

arXiv:1802.03958 [pdf, other]

doi 10.1109/MSP.2019.2894391

Signal Processing for High Throughput Satellite Systems: Challenges in New Interference-Limited Scenarios

Authors: Ana I. Perez-Neira, Miguel Angel Vazquez, Sina Maleki, M. R. Bhavani Shankar, Symeon Chatzinotas

Abstract: The field of satellite communications is enjoying a renewed interest in the global telecom market, and very high throughput satellites (V/HTS), with their multiple spot-beams, are key for delivering the future rate demands. In this article, the state-of-the-art and open research challenges of signal processing techniques for V/HTS systems are presented for the first time, with focus on novel appro… ▽ More The field of satellite communications is enjoying a renewed interest in the global telecom market, and very high throughput satellites (V/HTS), with their multiple spot-beams, are key for delivering the future rate demands. In this article, the state-of-the-art and open research challenges of signal processing techniques for V/HTS systems are presented for the first time, with focus on novel approaches for efficient interference mitigation. The main signal processing topics for the ground, satellite, and user segment are addressed. Also, the critical components for the integration of satellite and terrestrial networks are studied, such as cognitive satellite systems and satellite-terrestrial backhaul for caching. All the reviewed techniques are essential in empowering satellite systems to support the increasing demands of the upcoming generation of communication networks. △ Less

Submitted 12 February, 2018; originally announced February 2018.

arXiv:1007.5165 [pdf]

doi 10.5121/iju.2010.1303

Security Enhancement With Optimal QOS Using EAP-AKA In Hybrid Coupled 3G-WLAN Convergence Network

Authors: R. Shankar, Timothy Rajkumar. K, P. Dananjayan

Abstract: The third generation partnership project (3GPP) has addressed the feasibility of interworking and specified the interworking architecture and security architecture for third generation (3G)-wireless local area network (WLAN), it is develo**, system architecture evolution (SAE)/ long term evolution (LTE) architecture, for the next generation mobile communication system. To provide a secure 3G-WLA… ▽ More The third generation partnership project (3GPP) has addressed the feasibility of interworking and specified the interworking architecture and security architecture for third generation (3G)-wireless local area network (WLAN), it is develo**, system architecture evolution (SAE)/ long term evolution (LTE) architecture, for the next generation mobile communication system. To provide a secure 3G-WLAN interworking in the SAE/LTE architecture, Extensible authentication protocol-authentication and key agreement (EAP-AKA) is used. However, EAP-AKA have several vulnerabilities. Therefore, this paper not only analyses the threats and attacks in 3G-WLAN interworking but also proposes a new authentication and key agreement protocol based on EAP-AKA. The proposed protocol combines elliptic curve Diffie-Hellman (ECDH) with symmetric key cryptosystem to overcome the vulnerabilities. The proposed protocol is used in hybrid coupled 3G-WLAN convergence network to analyse its efficiency in terms of QoS metrics, the results obtained using OPNET 14.5 shows that the proposed protocol outperforms existing interworking protocols both in security and QoS. △ Less

Submitted 29 July, 2010; originally announced July 2010.

Comments: 12 pages, 5 figures

Journal ref: International Journal Of UbiComp 1.3 (2010) 31-42

arXiv:cs/0506032 [pdf]

Framework for Hopfield Network based Adaptive routing - A design level approach for adaptive routing phenomena with Artificial Neural Network

Authors: R. Shankar

Abstract: Routing, as a basic phenomena, by itself, has got umpteen scopes to analyse, discuss and arrive at an optimal solution for the technocrats over years. Routing is analysed based on many factors; few key constraints that decide the factors are communication medium, time dependency, information source nature. Parametric routing has become the requirement of the day, with some kind of adaptation to… ▽ More Routing, as a basic phenomena, by itself, has got umpteen scopes to analyse, discuss and arrive at an optimal solution for the technocrats over years. Routing is analysed based on many factors; few key constraints that decide the factors are communication medium, time dependency, information source nature. Parametric routing has become the requirement of the day, with some kind of adaptation to the underlying network environment. Satellite constellations, particularly LEO satellite constellations have become a reality in operational to have a non-breaking voice/data communication around the world.Routing in these constellations has to be treated in a non conventional way, taking their network geometry into consideration. One of the efficient methods of optimization is putting Neural Networks to use. Few Artificial Neural Network models are very much suitable for the adaptive control mechanism, by their nature of network arrangement. One such efficient model is Hopfield Network model. This paper is an attempt to design a framework for the Hopfield Network based adaptive routing phenomena in satellite constellations. △ Less

Submitted 10 June, 2005; originally announced June 2005.

Comments: (13 pages, 7 figures, code)

Showing 1–22 of 22 results for author: Shankar, R