Search | arXiv e-print repository

Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection

Authors: Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra, Vinod Rathod

Abstract: Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data char… ▽ More Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data characteristics and complex patterns. In this paper, we introduce the Straight-through Gumbel-Softmax (STGS) framework, offering a comprehensive approach to search multimodal fusion model architectures. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Initially, crucial features were efficiently identified from backbone networks, whereas within the cell structure, a weighted fusion operation integrated information from various sources. An architecture that maximizes the classification performance is derived by varying parameters such as temperature and sampling time. The experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4\% achieved with minimal model parameters. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2404.12679 [pdf, other]

MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement

Authors: Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra

Abstract: Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-en… ▽ More Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-encoded images to generate morphing attacks. In this paper, we propose a new method for generating high-quality morphing attacks using StyleGAN disentanglement. Our approach, called MLSD-GAN, spherically interpolates the disentangled latents to produce realistic and diverse morphing attacks. We evaluate the vulnerability of MLSD-GAN on two deep-learning-based FRS techniques. The results show that MLSD-GAN poses a significant threat to FRS, as it can generate morphing attacks that are highly effective at fooling these systems. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.16660 [pdf, other]

doi 10.1145/3408890

BOXREC: Recommending a Box of Preferred Outfits in Online Shop**

Authors: Debopriyo Banerjee, Krothapalli Sreenivasa Rao, Shamik Sural, Niloy Ganguly

Abstract: Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based… ▽ More Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based on user's fashion taste. However, none of these consider user's preference of price-range for individual clothing types or an overall shop** budget for a set of items. In this paper, we propose a box recommendation framework - BOXREC - which at first, collects user preferences across different item types (namely, top-wear, bottom-wear and foot-wear) including price-range of each type and a maximum shop** budget for a particular shop** session. It then generates a set of preferred outfits by retrieving all types of preferred items from the database (according to user specified preferences including price-ranges), creates all possible combinations of three preferred items (belonging to distinct item types) and verifies each combination using an outfit scoring framework - BOXREC-OSF. Finally, it provides a box full of fashion items, such that different combinations of the items maximize the number of outfits suitable for an occasion while satisfying maximum shop** budget. Empirical results show superior performance of BOXREC-OSF over the baseline methods. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Journal ref: ACM Trans. Intell. Syst. Technol. 11, 6, Article 69 (December 2020), pages 69:1-69:28

arXiv:2401.01356 [pdf, other]

Efficient Indexing of Meta-Data (Extracted from Educational Videos)

Authors: Shalika Kumbham, Abhijit Debnath, Krothapalli Sreenivasa Rao

Abstract: Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, m… ▽ More Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, maintained, and uploaded. These videos generally contain information about that video before the lecture begins. We generally observe that these educational videos have metadata containing five to six attributes: Institute Name, Publisher Name, Department Name, Professor Name, Subject Name, and Topic Name. It would be easy to maintain these videos if we could organize them according to their categories. The indexing of these videos based on this information is beneficial for students all around the world to efficiently utilise these videos. In this project, we are trying to get the metadata information mentioned above from the video lectures. △ Less

Submitted 11 December, 2023; originally announced January 2024.

arXiv:2312.01744 [pdf, other]

doi 10.1109/WASPAA58266.2023.10248144

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

Authors: Martin Strauss, Nicola Pia, Nagashree K. S. Rao, Bernd Edler

Abstract: This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFG… ▽ More This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

arXiv:2310.15071 [pdf, other]

doi 10.1016/j.physrep.2023.09.008

Experimental signatures of quantum and topological states in frustrated magnetism

Authors: J. Khatua, B. Sana, A. Zorko, M. Gomilšek, K. Sethupathi M. S. Ramachandra Rao, M. Baenitz, B. Schmidt, P. Khuntia

Abstract: Frustration in magnetic materials arising from competing exchange interactions can prevent the system from adopting long-range magnetic order and can instead lead to a diverse range of novel quantum and topological states with exotic quasiparticle excitations. Here, we review prominent examples of such emergent phenomena, including magnetically-disordered and extensively degenerate spin ices, whic… ▽ More Frustration in magnetic materials arising from competing exchange interactions can prevent the system from adopting long-range magnetic order and can instead lead to a diverse range of novel quantum and topological states with exotic quasiparticle excitations. Here, we review prominent examples of such emergent phenomena, including magnetically-disordered and extensively degenerate spin ices, which feature emergent magnetic monopole excitations, highly-entangled quantum spin liquids with fractional spinon excitations, topological order and emergent gauge fields, as well as complex particle-like topological spin textures known as skyrmions. We provide an overview of recent advances in the search for magnetically-disordered candidate materials on the three-dimensional pyrochlore lattice and two-dimensional triangular, kagome and honeycomb lattices, the latter with bond-dependent Kitaev interactions, and on lattices supporting topological magnetism. We highlight experimental signatures of these often elusive phenomena and single out the most suitable experimental techniques that can be used to detect them. Our review also aims at providing a comprehensive guide for designing and investigating novel frustrated magnetic materials, with the potential of addressing some important open questions in contemporary condensed matter physics. △ Less

Submitted 15 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Journal ref: Physics Reports 1041, 1 (2023)

arXiv:2310.12736 [pdf, other]

ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swap**

Authors: Aravinda Reddy PN, K. Sreenivasa Rao, Raghavendra Ramachandra, Pabitra mitra

Abstract: We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concate… ▽ More We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concatenated features into the extended latent space, we leverage the state-of-the-art quality and its rich semantic extended latent space. Extensive experiments suggest that the proposed method successfully disentangles identity and attribute features and outperforms many state-of-the-art face swap** methods, both qualitatively and quantitatively. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2202.01078 [pdf, other]

Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

Authors: Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das

Abstract: Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit… ▽ More Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 72 pages

arXiv:2112.04841 [pdf, other]

On The Effect Of Coding Artifacts On Acoustic Scene Classification

Authors: Nagashree K. S. Rao, Nils Peters

Abstract: Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or st… ▽ More Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or streamed) from low-power edge devices. In such scenario, the edge device may apply perceptual audio coding to reduce the transmission data rate. This paper explores the effect of perceptual audio coding on the classification performance using a DCASE 2020 challenge contribution [1]. We found that classification accuracy can degrade by up to 57% compared to classifying original (uncompressed) audio. We further demonstrate how lossy audio compression techniques during model training can improve classification accuracy of compressed audio signals even for audio codecs and codec bitrates not included in the training process. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: paper presented at the 2021 Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

arXiv:2109.04138 [pdf, other]

Multilingual Audio-Visual Smartphone Dataset And Evaluation

Authors: Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

Abstract: Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset cont… ▽ More Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones. △ Less

Submitted 15 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

arXiv:2101.09725 [pdf]

doi 10.1109/ACCESS.2021.3063031

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Authors: Hareesh Mandalapu, P N Aravinda Reddy, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

Abstract: Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and… ▽ More Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the system's performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics. △ Less

Submitted 12 March, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

Journal ref: in IEEE Access, vol. 9, pp. 37431-37455, 2021

arXiv:2011.06455 [pdf]

doi 10.1098/rsos.210429

Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic

Authors: Mahendra Piraveenan, Shailendra Sawleshwarkar, Michael Walsh, Iryna Zablotska, Samit Bhattacharyya, Habib Hassan Farooqui, Tarun Bhatnagar, Anup Karan, Manoj Murhekar, Sanjay Zodpey, K. S. Mallikarjuna Rao, Philippa Pattison, Albert Zomaya, Matjaz Perc

Abstract: Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their… ▽ More Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic. △ Less

Submitted 9 June, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: 15 pages, 1 figure; published in Royal Society Open Science

Journal ref: R. Soc. Open Sci. 8, 210429 (2021)

arXiv:2011.04297 [pdf, other]

Knowledge Distillation for Singing Voice Detection

Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C… ▽ More Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques. △ Less

Submitted 19 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

arXiv:1909.03974 [pdf, other]

doi 10.1007/s11063-019-10149-y

DNN-based cross-lingual voice conversion using Bottleneck Features

Authors: M Kiran Reddy, K Sreenivasa Rao

Abstract: Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different… ▽ More Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different languages. A DNN model is trained to learn the map** between bottleneck features and the corresponding spectral features of the target speaker. The proposed method can capture speaker-specific characteristics of a target speaker, and hence requires no speech data from source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method outperforms the baseline Gaussian mixture model (GMM)-based CLVC approach. △ Less

Submitted 10 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

arXiv:1908.09634 [pdf, ps, other]

Multilingual and Multimode Phone Recognition System for Indian Languages

Authors: Kumud Tripathi, M. Kiran Reddy, K. Sreenivasa Rao

Abstract: The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation, and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (1) Automatic speech mode clas… ▽ More The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation, and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (1) Automatic speech mode classification (SMC) and (2) Automatic phonetic recognition using mode-specific multilingual phone recognition system (MPRS). In this work, the vocal tract and excitation source features are considered for speech mode classification (SMC) task. SMC systems are developed using multilayer perceptron (MLP). Further, vocal tract, excitation source, and tandem features are used to build the deep neural network (DNN)-based MPRSs. The performance of the proposed approach is compared with mode-dependent MPRSs. Experimental results show that the proposed approach which combines both SMC and MPRS into a single system outperforms the baseline mode-dependent MPRSs. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: 33 pages, 5 figures, 6 tables, article

arXiv:1908.08668 [pdf, ps, other]

VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

Authors: Kumud Tripathi, K. Sreenivasa Rao

Abstract: In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropri… ▽ More In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using the phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods. The evaluation results show that the proposed method performs better than the existing methods. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: 21 pages, 8 figures, 4 tables, article

arXiv:1904.09765 [pdf, other]

hf0: A hybrid pitch extraction method for multimodal voice

Authors: Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao

Abstract: Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha… ▽ More Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches. △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: Pitch Extraction, F0 extraction, harmonic signals, speech, monophonic songs, Convolutional Neural Network, 5 pages, 5 figures

arXiv:1811.09956 [pdf, other]

Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

Authors: Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao

Abstract: In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.… ▽ More In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx. We have created a pathological dataset which consists of simultaneous recordings of glottal source and acoustic speech signal of six different disorders from vocal disordered patients. The GCI locations are manually annotated for disorder analysis and supervised learning. We have proposed convolutional neural network based GCI detection method by fusing deep acoustic speech and linear prediction residual features for robust GCI detection. The experimental results showed that the proposed method is significantly better than the state-of-the-art GCI detection methods. △ Less

Submitted 25 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/39

arXiv:1807.07710 [pdf, ps, other]

Multivariate Public Key Cryptography and Digital Signature

Authors: Pulugurtha Krishna Subba Rao, Duggirala Meher Krishna, Duggirala Ravi

Abstract: In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate map**s. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multiv… ▽ More In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate map**s. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multivariate equations involving polynomial or exponential map**s. The method is a general purpose utility for most data encryption, digital certificate or digital signature applications. For security protocols of the application layer level in the OSI model, the methods described in this paper are useful. △ Less

Submitted 23 July, 2018; v1 submitted 20 July, 2018; originally announced July 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1608.06472

MSC Class: 03C10; 11C08; 11T71; 12E20; 12Y05; 13A15; 13P10; 81P94; 94A60

arXiv:1706.05879 [pdf, other]

Competing Ferromagnetic and Anti-Ferromagnetic interactions in Iron Nitride $ζ$-Fe$_2$N

Authors: K. Sandeep Rao, H. G. Salunke

Abstract: The paper discusses the magnetic state of zeta phase of iron nitride viz. $ζ$-Fe$_2$N on the basis of spin polarized first principles electronic structure calculations together with a review of already published data. Results of our first principles study suggest that the ground state of $ζ$-Fe$_2$N is ferromagnetic (FM) with a magnetic moment of 1.528 $μ_\text{B}$ on the Fe site. The FM ground st… ▽ More The paper discusses the magnetic state of zeta phase of iron nitride viz. $ζ$-Fe$_2$N on the basis of spin polarized first principles electronic structure calculations together with a review of already published data. Results of our first principles study suggest that the ground state of $ζ$-Fe$_2$N is ferromagnetic (FM) with a magnetic moment of 1.528 $μ_\text{B}$ on the Fe site. The FM ground state is lower than the anti-ferromagnetic (AFM) state by 8.44 meV and non-magnetic(NM) state by 191 meV per formula unit. These results are important in view of reports which claim that $ζ$-Fe$_2$N undergoes an AFM transition below 10K and others which do not observe any magnetic transition up to 4.2K. We argue that the experimental results of AFM transition below 10K are inconclusive and we propose the presence of competing FM and AFM superexchange interactions between Fe sites mediated by nitrogen atoms, which are consistent with Goodenough-Kanamori-Anderson rules. We find that the anti-ferromagnetically coupled Fe sites are outnumbered by ferromagnetically coupled Fe sites leading to a stable FM ground state. A Stoner analysis of the results also supports our claim of a FM ground state. △ Less

Submitted 19 June, 2017; originally announced June 2017.

Comments: 10 pages, 7 figures, 3 tables

arXiv:1605.07544 [pdf, ps, other]

Evolutionary Stability of Polymorphic Population States in Continuous Games

Authors: Dharini Hingu, K. S. Mallikarjuna Rao, A. J. Shaiju

Abstract: In games with continuous strategy spaces, if a rest point of the replicator dynamics is asymptotically stable then the rest point must be finitely supported (van Veelen, M., Spreij, P., 2009. Evolution in games with a continuous action space. Econom. Theory 39 (3), 355-376). In this article, we address the converse question that is, we prove that a finitely supported population state is asymptotic… ▽ More In games with continuous strategy spaces, if a rest point of the replicator dynamics is asymptotically stable then the rest point must be finitely supported (van Veelen, M., Spreij, P., 2009. Evolution in games with a continuous action space. Econom. Theory 39 (3), 355-376). In this article, we address the converse question that is, we prove that a finitely supported population state is asymptotically stable with respect to the variational norm when it is strongly uninvadable. △ Less

Submitted 24 May, 2016; originally announced May 2016.

Comments: 19 Pages

arXiv:1602.08834 [pdf, other]

doi 10.1016/j.sysconle.2016.05.002

Characterization of maximum hands-off control

Authors: Debasish Chatterjee, Masaaki Nagahara, Daniel Quevedo, K. S. Mallikarjuna Rao

Abstract: Maximum hands-off control aims to maximize the length of time over which zero actuator values are applied to a system when executing specified control tasks. To tackle such problems, recent literature has investigated optimal control problems which penalize the size of the support of the control function and thereby lead to desired sparsity properties. This article gives the exact set of necessary… ▽ More Maximum hands-off control aims to maximize the length of time over which zero actuator values are applied to a system when executing specified control tasks. To tackle such problems, recent literature has investigated optimal control problems which penalize the size of the support of the control function and thereby lead to desired sparsity properties. This article gives the exact set of necessary conditions for a maximum hands-off optimal control problem using an $L_0$-(semi)norm, and also provides sufficient conditions for the optimality of such controls. Numerical example illustrates that adopting an $L_0$ cost leads to a sparse control, whereas an $L_1$-relaxation in singular problems leads to a non-sparse solution. △ Less

Submitted 29 February, 2016; originally announced February 2016.

Comments: 6 pages

Journal ref: Systems & Control Letters, Vol. 94, pp. 31-36, 2016

arXiv:1409.4962 [pdf]

Facile preparation of agarose-chitosan hybrid materials and nanocomposite ionogels using an ionic liquid via dissolution, regeneration and sol-gel transition

Authors: Tushar J. Trivedi, K. Srinivasa Rao, Arvind Kumar

Abstract: We report simultaneous dissolution of agarose (AG) and chitosan (CH) in varying proportions in an ionic liquid (IL), 1-butyl-3-methylimidazolium chloride [C4mim][Cl]. Composite materials were constructed from AG-CH-IL solutions using the antisolvent methanol, and IL was recovered from the solutions. Composite materials could be uniformly decorated with silver oxide (Ag2O) nanoparticles (Ag NPs) to… ▽ More We report simultaneous dissolution of agarose (AG) and chitosan (CH) in varying proportions in an ionic liquid (IL), 1-butyl-3-methylimidazolium chloride [C4mim][Cl]. Composite materials were constructed from AG-CH-IL solutions using the antisolvent methanol, and IL was recovered from the solutions. Composite materials could be uniformly decorated with silver oxide (Ag2O) nanoparticles (Ag NPs) to form nanocomposites in a single step by in situ synthesis of Ag NPs in AG-CH-IL sols, wherein the biopolymer moiety acted as both reducing and stabilizing agent. Cooling of Ag NPs-AG-CH-IL sols to room temperature resulted in high conductivity and high mechanical strength nanocomposite ionogels. The structure, stability and physiochemical properties of composite materials and nanocomposites were characterized by several analytical techniques, such as Fourier transform infrared (FTIR), CD spectroscopy, differential scanning colorimetric (DSC), thermogravimetric analysis (TGA), gel permeation chromatography (GPC), and scanning electron micrography (SEM). The result shows that composite materials have good thermal and conformational stability, compatibility and strong hydrogen bonding interactions between AG-CH complexes. Decoration of Ag NPs in composites and ionogels was confirmed by UV-Vis spectroscopy, SEM, TEM, EDAX and XRD. The mechanical and conducting properties of composite ionogels have been characterized by rheology and current-voltage measurements. Since Ag NPs show good antimicrobial activity, Ag NPs -AG-CH composite materials have the potential to be used in biotechnology and biomedical applications whereas nanocomposite ionogels will be suitable as precursors for applications such as quasi-solid dye sensitized solar cells, actuators, sensors or electrochromic displays. △ Less

Submitted 17 September, 2014; originally announced September 2014.

arXiv:1405.2049 [pdf, other]

A New Upperbound for the Oblivious Transfer Capacity of Discrete Memoryless Channels

Authors: K. Sankeerth Rao, Vinod M. Prabhakaran

Abstract: We derive a new upper bound on the string oblivious transfer capacity of discrete memoryless channels. The main tool we use is the tension region of a pair of random variables introduced in Prabhakaran and Prabhakaran (2014) where it was used to derive upper bounds on rates of secure sampling in the source model. In this paper, we consider secure computation of string oblivious transfer in the cha… ▽ More We derive a new upper bound on the string oblivious transfer capacity of discrete memoryless channels. The main tool we use is the tension region of a pair of random variables introduced in Prabhakaran and Prabhakaran (2014) where it was used to derive upper bounds on rates of secure sampling in the source model. In this paper, we consider secure computation of string oblivious transfer in the channel model. Our bound is based on a monotonicity property of the tension region in the channel model. We show that our bound strictly improves upon the upper bound of Ahlswede and Csiszár (2013). △ Less

Submitted 8 May, 2014; originally announced May 2014.

Comments: 7 pages, 3 figures, extended version of submission to IEEE Information Theory Workshop, 2014

arXiv:1209.4157 [pdf, ps, other]

AutoAmp : An Open-Source Analog Amplifier Design Tool - For Classroom and Lab Purposes

Authors: Om Prasad Patri, K. Sanmukh Rao

Abstract: This correspondence presents an open-source tool AutoAmp developed at the Indian Institute of Technology, Guwahati. It is available at http://sourceforge.net/projects/autoamp-iitg/ This tool helps the user to design different types of electronic amplifiers, using solid state devices, for a given specification. It can handle several types of designs namely common-emitter BJT amplifier (single and t… ▽ More This correspondence presents an open-source tool AutoAmp developed at the Indian Institute of Technology, Guwahati. It is available at http://sourceforge.net/projects/autoamp-iitg/ This tool helps the user to design different types of electronic amplifiers, using solid state devices, for a given specification. It can handle several types of designs namely common-emitter BJT amplifier (single and two-stage), operational amplifiers (inverting and non-inverting) and power amplifier. Not only does it design the amplifier, it also simulates the designed amplifier using SPICE simulator and displays the performance curves. This tool is deemed to prove invaluable in undergraduate teaching and labs. Especially in electronics-design related laboratories, the student need not design the amplifiers which are mostly the heart of many electronic designs. △ Less

Submitted 19 September, 2012; originally announced September 2012.

Comments: presented at the Indian Conference for Academic Research by Undergraduate Students (ICARUS), 2010, IIT Kanpur; AutoAmp : An Open-Source Analog Amplifier Design Tool - For Classroom and Lab Purposes, Proceedings of the Indian Conference for Academic Research by Undergraduate Students (ICARUS), 2010

arXiv:1201.2467 [pdf, other]

doi 10.1007/s13235-012-0051-x

Evolutionary Stability Against Multiple Mutations

Authors: Anirban Ghatak, K. S. Mallikarjuna Rao, A. J. Shaiju

Abstract: It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences. It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences. △ Less

Submitted 11 January, 2012; originally announced January 2012.

Comments: Submitted article

MSC Class: 91A22

arXiv:1001.4190 [pdf]

Speech Recognition of the letter 'zha' in Tamil Language using HMM

Authors: A. Srinivasan, K. Srinivasa Rao, K. Kannan, D. Narasimhan

Abstract: Speech signals of the letter 'zha' in Tamil language of 3 males and 3 females were coded using an improved version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz and the bit rate was at 15450 bits per second, where the original bit rate was at 128000 bits per second with the help of wave surfer audio tool. The output LPC cepstrum is implemented in first order three state… ▽ More Speech signals of the letter 'zha' in Tamil language of 3 males and 3 females were coded using an improved version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz and the bit rate was at 15450 bits per second, where the original bit rate was at 128000 bits per second with the help of wave surfer audio tool. The output LPC cepstrum is implemented in first order three state Hidden Markov Model(HMM) chain. △ Less

Submitted 23 January, 2010; originally announced January 2010.

Comments: 6 Pages

Report number: IJEST09-01-02-05

Journal ref: IJEST Volume 1 Issue 2 2009 67-72

arXiv:math/0602613 [pdf, ps, other]

Two-parameter quantum algebras, twin-basic numbers, and associated generalized hypergeometric series

Authors: R. Jagannathan, K. Srinivasa Rao

Abstract: We give a method to embed the q-series in a (p,q)-series and derive the corresponding (p,q)-extensions of the known q-identities. The (p,q)-hypergeometric series, or twin-basic hypergeometric series (diferent from the usual bibasic hypergeometric series), is based on the concept of twin-basic number [n]_{p,q} = (p^n - q^n)/(p-q). This twin-basic number occurs in the theory of two-parameter quant… ▽ More We give a method to embed the q-series in a (p,q)-series and derive the corresponding (p,q)-extensions of the known q-identities. The (p,q)-hypergeometric series, or twin-basic hypergeometric series (diferent from the usual bibasic hypergeometric series), is based on the concept of twin-basic number [n]_{p,q} = (p^n - q^n)/(p-q). This twin-basic number occurs in the theory of two-parameter quantum algebras and has been introduced independently in combinatorics. The (p,q)-identities thus derived, with doubling of the number of parameters, offer more choices for manipulations; for example, results that can be obtained via the limiting process of confluence in the usual q-series framework can be obtained by simpler substitutions. The q-results are of course special cases of the (p,q)-results corresponding to choosing p = 1. This also provides a new look for the q-identities. △ Less

Submitted 27 February, 2006; originally announced February 2006.

Comments: 16 pages, To appear in the Proceedings of the International Conference on Number Theory and Mathematical Physics, 20-21 December 2005, Srinivasa Ramanujan Centre, Kumbakonam, India

arXiv:math/0406076 [pdf, ps, other]

A probabilistic approach to second order variational inequalities with bilateral constraints

Authors: Mrinal K Ghosh, K S Mallikarjuna Rao

Abstract: We study a class of second order variational inequalities with bilateral constraints. Under certain conditions we show the existence of a unique viscosity solution of these variational inequalities and give a stochastic representation to this solution. As an application, we study a stochastic game with stop** times and show the existence of a saddle point equilibrium. We study a class of second order variational inequalities with bilateral constraints. Under certain conditions we show the existence of a unique viscosity solution of these variational inequalities and give a stochastic representation to this solution. As an application, we study a stochastic game with stop** times and show the existence of a saddle point equilibrium. △ Less

Submitted 4 June, 2004; originally announced June 2004.

Comments: 12 pages, no figures, no tables

Journal ref: Proc. Indian Acad. Sci. (Math. Sci.), Vol. 113, No. 4, November 2003, pp. 431-442

arXiv:math/0304317 [pdf, ps, other]

An Entry of Ramanujan on Hypergeometric Series in his Notebooks

Authors: K. Srinivasa Rao, G. Vanden Berghe, Christian Krattenthaler

Abstract: Example 7, after Entry 43, in Chapter XII of the first Notebook of Srinivasa Ramanujan is proved and, more generally, a summation theorem for $_3F_2(a,a,x;1+a,1+a+N;1)$, where $N$ is a non-negative integer, is derived. Example 7, after Entry 43, in Chapter XII of the first Notebook of Srinivasa Ramanujan is proved and, more generally, a summation theorem for $_3F_2(a,a,x;1+a,1+a+N;1)$, where $N$ is a non-negative integer, is derived. △ Less

Submitted 22 April, 2003; originally announced April 2003.

Comments: 8 pages, AmS-LaTeX

MSC Class: 33C20 (Primary) 33C05; 33B15 (Secondary)

Journal ref: J. Comput. Math. Appl. 173 (2004), 239-246.

arXiv:math/0003184 [pdf, ps, other]

Life and work of the mathemagician Srinivasa Ramanujan

Authors: K. Srinivasa Rao

Abstract: The Life of Srinivasa Ramanujan (1887 - 1920), the renowned Indian Mathematician, is presented, in this the first of a series of lectures, delivered at the Indian Institute for Advanced Study, Shimla. The Life of Srinivasa Ramanujan (1887 - 1920), the renowned Indian Mathematician, is presented, in this the first of a series of lectures, delivered at the Indian Institute for Advanced Study, Shimla. △ Less

Submitted 28 March, 2000; originally announced March 2000.

Comments: 30 pages, LaTeX file

Showing 1–31 of 31 results for author: Rao, K S