Search | arXiv e-print repository

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

Authors: Ankur Debnath, Shridevi S Patil, Gangotri Nadiger, Ramakrishnan Angarai Ganesan

Abstract: End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference… ▽ More End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2206.01475 [pdf, other]

Functional Connectivity Methods for EEG-based Biometrics on a Large, Heterogeneous Dataset

Authors: Pradeep Kumar G, Utsav Dutta, Kanishka Sharma, Ramakrishnan Angarai Ganesan

Abstract: This study examines the utility of functional connectivity (FC) and graph-based (GB) measures with a support vector machine classifier for use in electroencephalogram (EEG) based biometrics. Although FC-based features have been used in biometric applications, studies assessing the identification algorithms on heterogeneous and large datasets are scarce. This work investigates the performance of FC… ▽ More This study examines the utility of functional connectivity (FC) and graph-based (GB) measures with a support vector machine classifier for use in electroencephalogram (EEG) based biometrics. Although FC-based features have been used in biometric applications, studies assessing the identification algorithms on heterogeneous and large datasets are scarce. This work investigates the performance of FC and GB metrics on a dataset of 184 subjects formed by pooling three datasets recorded under different protocols and acquisition systems. The results demonstrate the higher discriminatory power of FC than GB metrics. The identification accuracy increases with higher frequency EEG bands, indicating the enhanced uniqueness of the neural signatures in beta and gamma bands. Using all the 56 EEG channels common to the three databases, the best identification accuracy of 97.4% is obtained using phase-locking value (PLV) based measures extracted from the gamma frequency band. Further, we investigate the effect of the length of the analysis epoch to determine the data acquisition time required to obtain satisfactory identification accuracy. When the number of channels is reduced to 21 from 56, there is a marginal reduction of 2.4% only in the identification accuracy using PLV features in the gamma band. Additional experiments have been conducted to study the effect of the cognitive state of the subject and mismatched train/test conditions on the performance of the system. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 11 pages, 5 figures and 7 Tables

Report number: MILE_2022_June01

arXiv:2201.09494 [pdf, other]

Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages

Authors: A. Madhavaraj, Ramakrishnan Angarai Ganesan

Abstract: We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition (ASR) system for a target language by pooling speech data from multiple source languages. Exploiting the acoustic similarities between Indian languages, we implement two approaches. In phone/senone map**, deep neural network (DNN) learns to map senones or phones from one language to the… ▽ More We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition (ASR) system for a target language by pooling speech data from multiple source languages. Exploiting the acoustic similarities between Indian languages, we implement two approaches. In phone/senone map**, deep neural network (DNN) learns to map senones or phones from one language to the others, and the transcriptions of the source languages are modified such that they can be used along with the target language data to train and fine-tune the target language ASR system. In the other approach, we model the acoustic information for all the languages simultaneously by training a multitask DNN (MTDNN) to predict the senones of each language in different output layers. The cross-entropy loss and the weight update procedure are modified such that only the shared layers and the output layer responsible for predicting the senone classes of a language are updated during training, if the feature vector belongs to that particular language. In the low-resource setting (LRS), 40 hours of transcribed data each for Tamil, Telugu and Gujarati languages are used for training. The DNN based senone map** technique gives relative improvements in word error rates (WER) of 9.66%, 7.2% and 15.21% over the baseline system for Tamil, Gujarati and Telugu languages, respectively. In medium-resourced setting (MRS), 160, 275 and 135 hours of data for Tamil, Kannada and Hindi languages are used, where, the same technique gives better relative improvements of 13.94%, 10.28% and 27.24% for Tamil, Kannada and Hindi, respectively. The MTDNN with senone map** based training in LRS, gives higher relative WER improvements of 15.0%, 17.54% and 16.06%, respectively for Tamil, Gujarati and Telugu, whereas in MRS, we see improvements of 21.24% 21.05% and 30.17% for Tamil, Kannada and Hindi languages, respectively. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2101.05161 [pdf, other]

Comparative Analysis of Agent-Oriented Task Assignment and Path Planning Algorithms Applied to Drone Swarms

Authors: Rohith Gandhi Ganesan, Samantha Kappagoda, Giuseppe Loianno, David K. A. Mordecai

Abstract: Autonomous drone swarms are a burgeoning technology with significant applications in the field of map**, inspection, transportation and monitoring. To complete a task, each drone has to accomplish a sub-goal within the context of the overall task at hand and navigate through the environment by avoiding collision with obstacles and with other agents in the environment. In this work, we choose the… ▽ More Autonomous drone swarms are a burgeoning technology with significant applications in the field of map**, inspection, transportation and monitoring. To complete a task, each drone has to accomplish a sub-goal within the context of the overall task at hand and navigate through the environment by avoiding collision with obstacles and with other agents in the environment. In this work, we choose the task of optimal coverage of an environment with drone swarms where the global knowledge of the goal states and its positions are known but not of the obstacles. The drones have to choose the Points of Interest (PoI) present in the environment to visit, along with the order to be visited to ensure fast coverage. We model this task in a simulation and use an agent-oriented approach to solve the problem. We evaluate different policy networks trained with reinforcement learning algorithms based on their effectiveness, i.e. time taken to map the area and efficiency, i.e. computational requirements. We couple the task assignment with path planning in an unique way for performing collision avoidance during navigation and compare a grid-based global planning algorithm, i.e. Wavefront and a gradient-based local planning algorithm, i.e. Potential Field. We also evaluate the Potential Field planning algorithm with different cost functions, propose a method to adaptively modify the velocity of the drone when using the Huber loss function to perform collision avoidance and observe its effect on the trajectory of the drones. We demonstrate our experiments in 2D and 3D simulations. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:1906.05929 [pdf, other]

Solving Large-Scale 0-1 Knapsack Problems and its Application to Point Cloud Resampling

Authors: Duanshun Li, **g Liu, Noseong Park, Dongeun Lee, Giridhar Ramachandran, Ali Seyedmazloom, Kook** Lee, Chen Feng, Vadim Sokolov, Rajesh Ganesan

Abstract: 0-1 knapsack is of fundamental importance in computer science, business, operations research, etc. In this paper, we present a deep learning technique-based method to solve large-scale 0-1 knapsack problems where the number of products (items) is large and/or the values of products are not necessarily predetermined but decided by an external value assignment function during the optimization proces… ▽ More 0-1 knapsack is of fundamental importance in computer science, business, operations research, etc. In this paper, we present a deep learning technique-based method to solve large-scale 0-1 knapsack problems where the number of products (items) is large and/or the values of products are not necessarily predetermined but decided by an external value assignment function during the optimization process. Our solution is greatly inspired by the method of Lagrange multiplier and some recent adoptions of game theory to deep learning. After formally defining our proposed method based on them, we develop an adaptive gradient ascent method to stabilize its optimization process. In our experiments, the presented method solves all the large-scale benchmark KP instances in a minute whereas existing methods show fluctuating runtime. We also show that our method can be used for other applications, including but not limited to the point cloud resampling. △ Less

Submitted 11 June, 2019; originally announced June 2019.

arXiv:1906.00740 [pdf, other]

Design of a 5G Ready and Reliable Architecture for the Smart Factory of the Future

Authors: Mathias Strufe, Michael Gundall, Hans D. Schotten, Christian Markwart, Rakash S. Ganesan, Markus Aleksy

Abstract: The increasing demands for highly individual products as well as for flexible production lines represent new challenges. To address these demands, future plants must be highly flexible and dynamically reconfigurable. Current systems are usually based on wired technologies for the connection of sensors, actuators, and controlling or monitoring devices that allow only very limited dynamics. New appl… ▽ More The increasing demands for highly individual products as well as for flexible production lines represent new challenges. To address these demands, future plants must be highly flexible and dynamically reconfigurable. Current systems are usually based on wired technologies for the connection of sensors, actuators, and controlling or monitoring devices that allow only very limited dynamics. New applications, such as the use of robots, drones, or reconfigurable production lines, require the exploitation of wireless communication technologies. However, current technologies are not able to meet the high requirements in terms of latency, robustness, resilience and data rate. The introduction of the 5th generation (5G) cellular communication system will meet these requirements for the first time. Besides the use of radio-based solutions in new plants - so-called greenfield scenarios - deploying 5G also represents an efficient migration of existing plants - so-called brownfield scenarios - to Industry 4.0. In order to ensure that the challenging requirements are indeed meet in practical deployments of the new 5G technology, a tailor-made architecture is being developed within the Tactile Internet 4.0 (TACNET 4.0) project. As a basis for the design of the architecture, representative Industry 4.0 application scenarios, which are also be considered by the 3rd Generation Partnership Project (3GPP), were analyzed and compliance with the latest developments in the relevant standardization is also our target. The paper gives an overview of the considered use cases as well as the relevant reference architectures and the design process of the TACNET 4.0 architecture. △ Less

Submitted 13 May, 2019; originally announced June 2019.

Comments: Submitted to 24. ITG Fachtagung Mobilkommunikation

arXiv:1810.05921 [pdf, other]

Two Can Play That Game: An Adversarial Evaluation of a Cyber-alert Inspection System

Authors: Ankit Shah, Arunesh Sinha, Rajesh Ganesan, Sushil Jajodia, Hasan Cam

Abstract: Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which form the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations. A recent work, in collaboration with Army Research Lab, USA proposed… ▽ More Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which form the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations. A recent work, in collaboration with Army Research Lab, USA proposed a reinforcement learning (RL) based approach to prevent the cyber-alert queue length from growing large and overwhelming the defender. Given the potential deployment of this approach to CSOCs run by US defense agencies, we perform a red team (adversarial) evaluation of this approach. Further, with the recent attacks on learning systems, it is even more important to test the limits of this RL approach. Towards that end, we learn an adversarial alert generation policy that is a best response to the defender inspection policy. Surprisingly, we find the defender policy to be quite robust to the best response of the attacker. In order to explain this observation, we extend the earlier RL model to a game model and show that there exists defender policies that can be robust against any adversarial policy. We also derive a competitive baseline from the game theory model and compare it to the RL approach. However, we go further to exploit assumptions made in the MDP in the RL model and discover an attacker policy that overwhelms the defender. We use a double oracle approach to retrain the defender with episodes from this discovered attacker policy. This made the defender robust to the discovered attacker policy and no further harmful attacker policies were discovered. Overall, the adversarial RL and double oracle approach in RL are general techniques that are applicable to other RL usage in adversarial environments. △ Less

Submitted 13 October, 2018; originally announced October 2018.

arXiv:1712.08735 [pdf, other]

Quantized Precoding for Multi-Antenna Downlink Channels with MAGIQ

Authors: Andrei Nedelcu, Fabian Steiner, Markus Staudacher, Gerhard Kramer, Wolfgang Zirwas, Rakash Sivasiva Ganesan, Paolo Baracca, Stefan Wesemann

Abstract: A multi-antenna, greedy, iterative, and quantized (MAGIQ) precoding algorithm is proposed for downlink channels. MAGIQ allows a straightforward integration with orthogonal frequency-division multiplexing (OFDM). MAGIQ is compared to three existing algorithms in terms of information rates and complexity: quantized linear precoding (QLP), SQUID, and an ADMM-based algorithm. The information rate is m… ▽ More A multi-antenna, greedy, iterative, and quantized (MAGIQ) precoding algorithm is proposed for downlink channels. MAGIQ allows a straightforward integration with orthogonal frequency-division multiplexing (OFDM). MAGIQ is compared to three existing algorithms in terms of information rates and complexity: quantized linear precoding (QLP), SQUID, and an ADMM-based algorithm. The information rate is measured by using a lower bound for finite modulation sets, and the complexity is measured by the number of multiplications and comparisons. MAGIQ and ADMM achieve similar information rates with similar complexity for Rayleigh flat-fading channels and one-bit quantization per real dimension, and they outperform QLP and SQUID for higher order modulation. △ Less

Submitted 2 March, 2018; v1 submitted 23 December, 2017; originally announced December 2017.

Comments: extended abstract

arXiv:1702.02414 [pdf, other]

Constructing Receiver Signal Points using Constrained Massive MIMO Arrays

Authors: Markus Staudacher, Gerhard Kramer, Wolfgang Zirwas, Berthold Panzner, Rakash Sivasiva Ganesan

Abstract: A low cost solution for constructing receiver signal points is investigated that combines a large number of constrained radio frequency (RF) frontends with a limited number of full RF chains. The constrained RF front ends have low cost and are limited to on/off switching of antenna elements and a small number of phases. Severe degradations are typically observed for multi-user MIMO for these simpl… ▽ More A low cost solution for constructing receiver signal points is investigated that combines a large number of constrained radio frequency (RF) frontends with a limited number of full RF chains. The constrained RF front ends have low cost and are limited to on/off switching of antenna elements and a small number of phases. Severe degradations are typically observed for multi-user MIMO for these simple on/off antenna arrays. A few full RF frontends are shown to compensate for the signal errors of the high number of constrained RF frontends for various scenarios. An algorithm for such a hybrid RF (HRF) system is developed that achieves performance close to that of exhaustive search with respect to the mean square error of the constructed receiver signals for Rayleigh fading and the WINNER 2 Urban Macro channel model. △ Less

Submitted 8 February, 2017; originally announced February 2017.

arXiv:1503.06101 [pdf, ps, other]

Maximizing the Sum Rate in Cellular Networks Using Multi-Convex Optimization

Authors: Hussein Al-Shatri, Xiang Li, Rakash SivaSiva Ganesan, Anja Klein, Tobias Weber

Abstract: In this paper, we propose a novel algorithm to maximize the sum rate in interference-limited scenarios where each user decodes its own message with the presence of unknown interferences and noise considering the signal-to-interference-plus-noise-ratio. It is known that the problem of adapting the transmit and receive filters of the users to maximize the sum rate with a sum transmit power constrain… ▽ More In this paper, we propose a novel algorithm to maximize the sum rate in interference-limited scenarios where each user decodes its own message with the presence of unknown interferences and noise considering the signal-to-interference-plus-noise-ratio. It is known that the problem of adapting the transmit and receive filters of the users to maximize the sum rate with a sum transmit power constraint is non-convex. Our novel approach is to formulate the sum rate maximization problem as an equivalent multi-convex optimization problem by adding two sets of auxiliary variables. An iterative algorithm which alternatingly adjusts the system variables and the auxiliary variables is proposed to solve the multi-convex optimization problem. The proposed algorithm is applied to a downlink cellular scenario consisting of several cells each of which contains a base station serving several mobile stations. We examine the two cases, with or without several half-duplex amplify-and-forward relays assisting the transmission. A sum power constraint at the base stations and a sum power constraint at the relays are assumed. Finally, we show that the proposed multi-convex formulation of the sum rate maximization problem is applicable to many other wireless systems in which the estimated data symbols are multi-affine functions of the system variables. △ Less

Submitted 20 March, 2015; originally announced March 2015.

Comments: 24 pages, 5 figures

Showing 1–10 of 10 results for author: Ganesan, R