-
Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection
Authors:
Aravinda Reddy PN,
Raghavendra Ramachandra,
Krothapalli Sreenivasa Rao,
Pabitra Mitra,
Vinod Rathod
Abstract:
Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data char…
▽ More
Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data characteristics and complex patterns. In this paper, we introduce the Straight-through Gumbel-Softmax (STGS) framework, offering a comprehensive approach to search multimodal fusion model architectures. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Initially, crucial features were efficiently identified from backbone networks, whereas within the cell structure, a weighted fusion operation integrated information from various sources. An architecture that maximizes the classification performance is derived by varying parameters such as temperature and sampling time. The experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4\% achieved with minimal model parameters.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement
Authors:
Aravinda Reddy PN,
Raghavendra Ramachandra,
Krothapalli Sreenivasa Rao,
Pabitra Mitra
Abstract:
Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-en…
▽ More
Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-encoded images to generate morphing attacks. In this paper, we propose a new method for generating high-quality morphing attacks using StyleGAN disentanglement. Our approach, called MLSD-GAN, spherically interpolates the disentangled latents to produce realistic and diverse morphing attacks. We evaluate the vulnerability of MLSD-GAN on two deep-learning-based FRS techniques. The results show that MLSD-GAN poses a significant threat to FRS, as it can generate morphing attacks that are highly effective at fooling these systems.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
BOXREC: Recommending a Box of Preferred Outfits in Online Shop**
Authors:
Debopriyo Banerjee,
Krothapalli Sreenivasa Rao,
Shamik Sural,
Niloy Ganguly
Abstract:
Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based…
▽ More
Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based on user's fashion taste. However, none of these consider user's preference of price-range for individual clothing types or an overall shop** budget for a set of items. In this paper, we propose a box recommendation framework - BOXREC - which at first, collects user preferences across different item types (namely, top-wear, bottom-wear and foot-wear) including price-range of each type and a maximum shop** budget for a particular shop** session. It then generates a set of preferred outfits by retrieving all types of preferred items from the database (according to user specified preferences including price-ranges), creates all possible combinations of three preferred items (belonging to distinct item types) and verifies each combination using an outfit scoring framework - BOXREC-OSF. Finally, it provides a box full of fashion items, such that different combinations of the items maximize the number of outfits suitable for an occasion while satisfying maximum shop** budget. Empirical results show superior performance of BOXREC-OSF over the baseline methods.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Efficient Indexing of Meta-Data (Extracted from Educational Videos)
Authors:
Shalika Kumbham,
Abhijit Debnath,
Krothapalli Sreenivasa Rao
Abstract:
Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, m…
▽ More
Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, maintained, and uploaded. These videos generally contain information about that video before the lecture begins. We generally observe that these educational videos have metadata containing five to six attributes: Institute Name, Publisher Name, Department Name, Professor Name, Subject Name, and Topic Name. It would be easy to maintain these videos if we could organize them according to their categories. The indexing of these videos based on this information is beneficial for students all around the world to efficiently utilise these videos. In this project, we are trying to get the metadata information mentioned above from the video lectures.
△ Less
Submitted 11 December, 2023;
originally announced January 2024.
-
SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement
Authors:
Martin Strauss,
Nicola Pia,
Nagashree K. S. Rao,
Bernd Edler
Abstract:
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFG…
▽ More
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Experimental signatures of quantum and topological states in frustrated magnetism
Authors:
J. Khatua,
B. Sana,
A. Zorko,
M. Gomilšek,
K. Sethupathi M. S. Ramachandra Rao,
M. Baenitz,
B. Schmidt,
P. Khuntia
Abstract:
Frustration in magnetic materials arising from competing exchange interactions can prevent the system from adopting long-range magnetic order and can instead lead to a diverse range of novel quantum and topological states with exotic quasiparticle excitations. Here, we review prominent examples of such emergent phenomena, including magnetically-disordered and extensively degenerate spin ices, whic…
▽ More
Frustration in magnetic materials arising from competing exchange interactions can prevent the system from adopting long-range magnetic order and can instead lead to a diverse range of novel quantum and topological states with exotic quasiparticle excitations. Here, we review prominent examples of such emergent phenomena, including magnetically-disordered and extensively degenerate spin ices, which feature emergent magnetic monopole excitations, highly-entangled quantum spin liquids with fractional spinon excitations, topological order and emergent gauge fields, as well as complex particle-like topological spin textures known as skyrmions. We provide an overview of recent advances in the search for magnetically-disordered candidate materials on the three-dimensional pyrochlore lattice and two-dimensional triangular, kagome and honeycomb lattices, the latter with bond-dependent Kitaev interactions, and on lattices supporting topological magnetism. We highlight experimental signatures of these often elusive phenomena and single out the most suitable experimental techniques that can be used to detect them. Our review also aims at providing a comprehensive guide for designing and investigating novel frustrated magnetic materials, with the potential of addressing some important open questions in contemporary condensed matter physics.
△ Less
Submitted 15 November, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swap**
Authors:
Aravinda Reddy PN,
K. Sreenivasa Rao,
Raghavendra Ramachandra,
Pabitra mitra
Abstract:
We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concate…
▽ More
We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concatenated features into the extended latent space, we leverage the state-of-the-art quality and its rich semantic extended latent space. Extensive experiments suggest that the proposed method successfully disentangles identity and attribute features and outperforms many state-of-the-art face swap** methods, both qualitatively and quantitatively.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review
Authors:
Gurunath Reddy M,
K. Sreenivasa Rao,
Partha Pratim Das
Abstract:
Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit…
▽ More
Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
On The Effect Of Coding Artifacts On Acoustic Scene Classification
Authors:
Nagashree K. S. Rao,
Nils Peters
Abstract:
Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or st…
▽ More
Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or streamed) from low-power edge devices. In such scenario, the edge device may apply perceptual audio coding to reduce the transmission data rate. This paper explores the effect of perceptual audio coding on the classification performance using a DCASE 2020 challenge contribution [1]. We found that classification accuracy can degrade by up to 57% compared to classifying original (uncompressed) audio. We further demonstrate how lossy audio compression techniques during model training can improve classification accuracy of compressed audio signals even for audio codecs and codec bitrates not included in the training process.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Multilingual Audio-Visual Smartphone Dataset And Evaluation
Authors:
Hareesh Mandalapu,
Aravinda Reddy P N,
Raghavendra Ramachandra,
K Sreenivasa Rao,
Pabitra Mitra,
S R Mahadeva Prasanna,
Christoph Busch
Abstract:
Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset cont…
▽ More
Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.
△ Less
Submitted 15 November, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey
Authors:
Hareesh Mandalapu,
P N Aravinda Reddy,
Raghavendra Ramachandra,
K Sreenivasa Rao,
Pabitra Mitra,
S R Mahadeva Prasanna,
Christoph Busch
Abstract:
Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and…
▽ More
Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the system's performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics.
△ Less
Submitted 12 March, 2021; v1 submitted 24 January, 2021;
originally announced January 2021.
-
Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic
Authors:
Mahendra Piraveenan,
Shailendra Sawleshwarkar,
Michael Walsh,
Iryna Zablotska,
Samit Bhattacharyya,
Habib Hassan Farooqui,
Tarun Bhatnagar,
Anup Karan,
Manoj Murhekar,
Sanjay Zodpey,
K. S. Mallikarjuna Rao,
Philippa Pattison,
Albert Zomaya,
Matjaz Perc
Abstract:
Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their…
▽ More
Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic.
△ Less
Submitted 9 June, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Knowledge Distillation for Singing Voice Detection
Authors:
Soumava Paul,
Gurunath Reddy M,
K Sreenivasa Rao,
Partha Pratim Das
Abstract:
Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C…
▽ More
Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques.
△ Less
Submitted 19 August, 2021; v1 submitted 9 November, 2020;
originally announced November 2020.
-
DNN-based cross-lingual voice conversion using Bottleneck Features
Authors:
M Kiran Reddy,
K Sreenivasa Rao
Abstract:
Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different…
▽ More
Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different languages. A DNN model is trained to learn the map** between bottleneck features and the corresponding spectral features of the target speaker. The proposed method can capture speaker-specific characteristics of a target speaker, and hence requires no speech data from source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method outperforms the baseline Gaussian mixture model (GMM)-based CLVC approach.
△ Less
Submitted 10 September, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Multilingual and Multimode Phone Recognition System for Indian Languages
Authors:
Kumud Tripathi,
M. Kiran Reddy,
K. Sreenivasa Rao
Abstract:
The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation, and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (1) Automatic speech mode clas…
▽ More
The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation, and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (1) Automatic speech mode classification (SMC) and (2) Automatic phonetic recognition using mode-specific multilingual phone recognition system (MPRS). In this work, the vocal tract and excitation source features are considered for speech mode classification (SMC) task. SMC systems are developed using multilayer perceptron (MLP). Further, vocal tract, excitation source, and tandem features are used to build the deep neural network (DNN)-based MPRSs. The performance of the proposed approach is compared with mode-dependent MPRSs. Experimental results show that the proposed approach which combines both SMC and MPRS into a single system outperforms the baseline mode-dependent MPRSs.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries
Authors:
Kumud Tripathi,
K. Sreenivasa Rao
Abstract:
In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropri…
▽ More
In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using the phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods. The evaluation results show that the proposed method performs better than the existing methods.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
hf0: A hybrid pitch extraction method for multimodal voice
Authors:
Pradeep Rengaswamy,
Gurunath Reddy M,
Krothapalli Sreenivasa Rao
Abstract:
Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha…
▽ More
Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, harmonic property in spectral or combined form to extract the pitch is developed. Hence, there is no single unified method which can reliably extract the pitch from various modes of the acoustic signal. In this work, we propose a hybrid f0 extraction method which seamlessly extracts the pitch across modes of speech production with very high accuracy required for many applications. The proposed hybrid model exploits the advantages of deep learning and signal processing methods to minimize the pitch detection error and adopts to various modes of acoustic signal. Specifically, we propose an ordinal regression convolutional neural networks to map the periodicity rich input representation to obtain the nominal pitch classes which drastically reduces the number of classes required for pitch detection unlike other deep learning approaches. Further, the accurate f0 is estimated from the nominal pitch class labels by filtering and autocorrelation. We show that the proposed method generalizes to the unseen modes of voice production and various noises for large scale datasets. Also, the proposed hybrid model significantly reduces the learning parameters required to train the deep model compared to other methods. Furthermore,the evaluation measures showed that the proposed method is significantly better than the state-of-the-art signal processing and deep learning approaches.
△ Less
Submitted 22 April, 2019;
originally announced April 2019.
-
Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning
Authors:
Gurunath Reddy M,
Tanumay Mandal,
Krothapalli Sreenivasa Rao
Abstract:
In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.…
▽ More
In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx. We have created a pathological dataset which consists of simultaneous recordings of glottal source and acoustic speech signal of six different disorders from vocal disordered patients. The GCI locations are manually annotated for disorder analysis and supervised learning. We have proposed convolutional neural network based GCI detection method by fusing deep acoustic speech and linear prediction residual features for robust GCI detection. The experimental results showed that the proposed method is significantly better than the state-of-the-art GCI detection methods.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
Multivariate Public Key Cryptography and Digital Signature
Authors:
Pulugurtha Krishna Subba Rao,
Duggirala Meher Krishna,
Duggirala Ravi
Abstract:
In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate map**s. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multiv…
▽ More
In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate map**s. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multivariate equations involving polynomial or exponential map**s. The method is a general purpose utility for most data encryption, digital certificate or digital signature applications. For security protocols of the application layer level in the OSI model, the methods described in this paper are useful.
△ Less
Submitted 23 July, 2018; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Competing Ferromagnetic and Anti-Ferromagnetic interactions in Iron Nitride $ζ$-Fe$_2$N
Authors:
K. Sandeep Rao,
H. G. Salunke
Abstract:
The paper discusses the magnetic state of zeta phase of iron nitride viz. $ζ$-Fe$_2$N on the basis of spin polarized first principles electronic structure calculations together with a review of already published data. Results of our first principles study suggest that the ground state of $ζ$-Fe$_2$N is ferromagnetic (FM) with a magnetic moment of 1.528 $μ_\text{B}$ on the Fe site. The FM ground st…
▽ More
The paper discusses the magnetic state of zeta phase of iron nitride viz. $ζ$-Fe$_2$N on the basis of spin polarized first principles electronic structure calculations together with a review of already published data. Results of our first principles study suggest that the ground state of $ζ$-Fe$_2$N is ferromagnetic (FM) with a magnetic moment of 1.528 $μ_\text{B}$ on the Fe site. The FM ground state is lower than the anti-ferromagnetic (AFM) state by 8.44 meV and non-magnetic(NM) state by 191 meV per formula unit. These results are important in view of reports which claim that $ζ$-Fe$_2$N undergoes an AFM transition below 10K and others which do not observe any magnetic transition up to 4.2K. We argue that the experimental results of AFM transition below 10K are inconclusive and we propose the presence of competing FM and AFM superexchange interactions between Fe sites mediated by nitrogen atoms, which are consistent with Goodenough-Kanamori-Anderson rules. We find that the anti-ferromagnetically coupled Fe sites are outnumbered by ferromagnetically coupled Fe sites leading to a stable FM ground state. A Stoner analysis of the results also supports our claim of a FM ground state.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Evolutionary Stability of Polymorphic Population States in Continuous Games
Authors:
Dharini Hingu,
K. S. Mallikarjuna Rao,
A. J. Shaiju
Abstract:
In games with continuous strategy spaces, if a rest point of the replicator dynamics is asymptotically stable then the rest point must be finitely supported (van Veelen, M., Spreij, P., 2009. Evolution in games with a continuous action space. Econom. Theory 39 (3), 355-376). In this article, we address the converse question that is, we prove that a finitely supported population state is asymptotic…
▽ More
In games with continuous strategy spaces, if a rest point of the replicator dynamics is asymptotically stable then the rest point must be finitely supported (van Veelen, M., Spreij, P., 2009. Evolution in games with a continuous action space. Econom. Theory 39 (3), 355-376). In this article, we address the converse question that is, we prove that a finitely supported population state is asymptotically stable with respect to the variational norm when it is strongly uninvadable.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Characterization of maximum hands-off control
Authors:
Debasish Chatterjee,
Masaaki Nagahara,
Daniel Quevedo,
K. S. Mallikarjuna Rao
Abstract:
Maximum hands-off control aims to maximize the length of time over which zero actuator values are applied to a system when executing specified control tasks. To tackle such problems, recent literature has investigated optimal control problems which penalize the size of the support of the control function and thereby lead to desired sparsity properties. This article gives the exact set of necessary…
▽ More
Maximum hands-off control aims to maximize the length of time over which zero actuator values are applied to a system when executing specified control tasks. To tackle such problems, recent literature has investigated optimal control problems which penalize the size of the support of the control function and thereby lead to desired sparsity properties. This article gives the exact set of necessary conditions for a maximum hands-off optimal control problem using an $L_0$-(semi)norm, and also provides sufficient conditions for the optimality of such controls. Numerical example illustrates that adopting an $L_0$ cost leads to a sparse control, whereas an $L_1$-relaxation in singular problems leads to a non-sparse solution.
△ Less
Submitted 29 February, 2016;
originally announced February 2016.
-
Facile preparation of agarose-chitosan hybrid materials and nanocomposite ionogels using an ionic liquid via dissolution, regeneration and sol-gel transition
Authors:
Tushar J. Trivedi,
K. Srinivasa Rao,
Arvind Kumar
Abstract:
We report simultaneous dissolution of agarose (AG) and chitosan (CH) in varying proportions in an ionic liquid (IL), 1-butyl-3-methylimidazolium chloride [C4mim][Cl]. Composite materials were constructed from AG-CH-IL solutions using the antisolvent methanol, and IL was recovered from the solutions. Composite materials could be uniformly decorated with silver oxide (Ag2O) nanoparticles (Ag NPs) to…
▽ More
We report simultaneous dissolution of agarose (AG) and chitosan (CH) in varying proportions in an ionic liquid (IL), 1-butyl-3-methylimidazolium chloride [C4mim][Cl]. Composite materials were constructed from AG-CH-IL solutions using the antisolvent methanol, and IL was recovered from the solutions. Composite materials could be uniformly decorated with silver oxide (Ag2O) nanoparticles (Ag NPs) to form nanocomposites in a single step by in situ synthesis of Ag NPs in AG-CH-IL sols, wherein the biopolymer moiety acted as both reducing and stabilizing agent. Cooling of Ag NPs-AG-CH-IL sols to room temperature resulted in high conductivity and high mechanical strength nanocomposite ionogels. The structure, stability and physiochemical properties of composite materials and nanocomposites were characterized by several analytical techniques, such as Fourier transform infrared (FTIR), CD spectroscopy, differential scanning colorimetric (DSC), thermogravimetric analysis (TGA), gel permeation chromatography (GPC), and scanning electron micrography (SEM). The result shows that composite materials have good thermal and conformational stability, compatibility and strong hydrogen bonding interactions between AG-CH complexes. Decoration of Ag NPs in composites and ionogels was confirmed by UV-Vis spectroscopy, SEM, TEM, EDAX and XRD. The mechanical and conducting properties of composite ionogels have been characterized by rheology and current-voltage measurements. Since Ag NPs show good antimicrobial activity, Ag NPs -AG-CH composite materials have the potential to be used in biotechnology and biomedical applications whereas nanocomposite ionogels will be suitable as precursors for applications such as quasi-solid dye sensitized solar cells, actuators, sensors or electrochromic displays.
△ Less
Submitted 17 September, 2014;
originally announced September 2014.
-
A New Upperbound for the Oblivious Transfer Capacity of Discrete Memoryless Channels
Authors:
K. Sankeerth Rao,
Vinod M. Prabhakaran
Abstract:
We derive a new upper bound on the string oblivious transfer capacity of discrete memoryless channels. The main tool we use is the tension region of a pair of random variables introduced in Prabhakaran and Prabhakaran (2014) where it was used to derive upper bounds on rates of secure sampling in the source model. In this paper, we consider secure computation of string oblivious transfer in the cha…
▽ More
We derive a new upper bound on the string oblivious transfer capacity of discrete memoryless channels. The main tool we use is the tension region of a pair of random variables introduced in Prabhakaran and Prabhakaran (2014) where it was used to derive upper bounds on rates of secure sampling in the source model. In this paper, we consider secure computation of string oblivious transfer in the channel model. Our bound is based on a monotonicity property of the tension region in the channel model. We show that our bound strictly improves upon the upper bound of Ahlswede and Csiszár (2013).
△ Less
Submitted 8 May, 2014;
originally announced May 2014.
-
AutoAmp : An Open-Source Analog Amplifier Design Tool - For Classroom and Lab Purposes
Authors:
Om Prasad Patri,
K. Sanmukh Rao
Abstract:
This correspondence presents an open-source tool AutoAmp developed at the Indian Institute of Technology, Guwahati. It is available at http://sourceforge.net/projects/autoamp-iitg/ This tool helps the user to design different types of electronic amplifiers, using solid state devices, for a given specification. It can handle several types of designs namely common-emitter BJT amplifier (single and t…
▽ More
This correspondence presents an open-source tool AutoAmp developed at the Indian Institute of Technology, Guwahati. It is available at http://sourceforge.net/projects/autoamp-iitg/ This tool helps the user to design different types of electronic amplifiers, using solid state devices, for a given specification. It can handle several types of designs namely common-emitter BJT amplifier (single and two-stage), operational amplifiers (inverting and non-inverting) and power amplifier. Not only does it design the amplifier, it also simulates the designed amplifier using SPICE simulator and displays the performance curves. This tool is deemed to prove invaluable in undergraduate teaching and labs. Especially in electronics-design related laboratories, the student need not design the amplifiers which are mostly the heart of many electronic designs.
△ Less
Submitted 19 September, 2012;
originally announced September 2012.
-
Evolutionary Stability Against Multiple Mutations
Authors:
Anirban Ghatak,
K. S. Mallikarjuna Rao,
A. J. Shaiju
Abstract:
It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
It is known (see e.g. Weibull (1995)) that ESS is not robust against multiple mutations. In this article, we introduce robustness against multiple mutations and study some equivalent formulations and consequences.
△ Less
Submitted 11 January, 2012;
originally announced January 2012.
-
Speech Recognition of the letter 'zha' in Tamil Language using HMM
Authors:
A. Srinivasan,
K. Srinivasa Rao,
K. Kannan,
D. Narasimhan
Abstract:
Speech signals of the letter 'zha' in Tamil language of 3 males and 3 females were coded using an improved version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz and the bit rate was at 15450 bits per second, where the original bit rate was at 128000 bits per second with the help of wave surfer audio tool. The output LPC cepstrum is implemented in first order three state…
▽ More
Speech signals of the letter 'zha' in Tamil language of 3 males and 3 females were coded using an improved version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz and the bit rate was at 15450 bits per second, where the original bit rate was at 128000 bits per second with the help of wave surfer audio tool. The output LPC cepstrum is implemented in first order three state Hidden Markov Model(HMM) chain.
△ Less
Submitted 23 January, 2010;
originally announced January 2010.
-
Two-parameter quantum algebras, twin-basic numbers, and associated generalized hypergeometric series
Authors:
R. Jagannathan,
K. Srinivasa Rao
Abstract:
We give a method to embed the q-series in a (p,q)-series and derive the corresponding (p,q)-extensions of the known q-identities. The (p,q)-hypergeometric series, or twin-basic hypergeometric series (diferent from the usual bibasic hypergeometric series), is based on the concept of twin-basic number [n]_{p,q} = (p^n - q^n)/(p-q). This twin-basic number occurs in the theory of two-parameter quant…
▽ More
We give a method to embed the q-series in a (p,q)-series and derive the corresponding (p,q)-extensions of the known q-identities. The (p,q)-hypergeometric series, or twin-basic hypergeometric series (diferent from the usual bibasic hypergeometric series), is based on the concept of twin-basic number [n]_{p,q} = (p^n - q^n)/(p-q). This twin-basic number occurs in the theory of two-parameter quantum algebras and has been introduced independently in combinatorics. The (p,q)-identities thus derived, with doubling of the number of parameters, offer more choices for manipulations; for example, results that can be obtained via the limiting process of confluence in the usual q-series framework can be obtained by simpler substitutions. The q-results are of course special cases of the (p,q)-results corresponding to choosing p = 1. This also provides a new look for the q-identities.
△ Less
Submitted 27 February, 2006;
originally announced February 2006.
-
A probabilistic approach to second order variational inequalities with bilateral constraints
Authors:
Mrinal K Ghosh,
K S Mallikarjuna Rao
Abstract:
We study a class of second order variational inequalities with bilateral constraints. Under certain conditions we show the existence of a unique viscosity solution of these variational inequalities and give a stochastic representation to this solution. As an application, we study a stochastic game with stop** times and show the existence of a saddle point equilibrium.
We study a class of second order variational inequalities with bilateral constraints. Under certain conditions we show the existence of a unique viscosity solution of these variational inequalities and give a stochastic representation to this solution. As an application, we study a stochastic game with stop** times and show the existence of a saddle point equilibrium.
△ Less
Submitted 4 June, 2004;
originally announced June 2004.
-
An Entry of Ramanujan on Hypergeometric Series in his Notebooks
Authors:
K. Srinivasa Rao,
G. Vanden Berghe,
Christian Krattenthaler
Abstract:
Example 7, after Entry 43, in Chapter XII of the first Notebook of Srinivasa Ramanujan is proved and, more generally, a summation theorem for $_3F_2(a,a,x;1+a,1+a+N;1)$, where $N$ is a non-negative integer, is derived.
Example 7, after Entry 43, in Chapter XII of the first Notebook of Srinivasa Ramanujan is proved and, more generally, a summation theorem for $_3F_2(a,a,x;1+a,1+a+N;1)$, where $N$ is a non-negative integer, is derived.
△ Less
Submitted 22 April, 2003;
originally announced April 2003.
-
Life and work of the mathemagician Srinivasa Ramanujan
Authors:
K. Srinivasa Rao
Abstract:
The Life of Srinivasa Ramanujan (1887 - 1920), the renowned Indian Mathematician, is presented, in this the first of a series of lectures, delivered at the Indian Institute for Advanced Study, Shimla.
The Life of Srinivasa Ramanujan (1887 - 1920), the renowned Indian Mathematician, is presented, in this the first of a series of lectures, delivered at the Indian Institute for Advanced Study, Shimla.
△ Less
Submitted 28 March, 2000;
originally announced March 2000.