Skip to main content

Showing 1–29 of 29 results for author: Ramakrishnan, A G

.
  1. arXiv:2406.18135  [pdf

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition for Hindi

    Authors: Anish Saha, A. G. Ramakrishnan

    Abstract: Automatic speech recognition (ASR) is a key area in computational linguistics, focusing on develo** technologies that enable computers to convert spoken language into text. This field combines linguistics and machine learning. ASR models, which map speech audio to transcripts through supervised learning, require handling real and unrestricted text. Text-to-speech systems directly work with real… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2312.09599  [pdf

    eess.SP

    Brain-scale Theta Band Functional Connectivity As A Signature of Slow Breathing and Breath-hold Phases

    Authors: Anusha A. S., Pradeep Kumar G., A. G. Ramakrishnan

    Abstract: The study reported herein attempts to understand the neural mechanisms engaged in the conscious control of breathing and breath-hold. The variations in the electroencephalogram (EEG) based functional connectivity (FC) of the human brain during consciously controlled breathing at 2 cycles per minute (cpm), and breath-hold have been investigated and reported here. An experimental protocol involving… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2310.17138  [pdf, other

    cs.CV

    A Classifier Using Global Character Level and Local Sub-unit Level Features for Hindi Online Handwritten Character Recognition

    Authors: Anand Sharma, A. G. Ramakrishnan

    Abstract: A classifier is developed that defines a joint distribution of global character features, number of sub-units and local sub-unit features to model Hindi online handwritten characters. The classifier uses latent variables to model the structure of sub-units. The classifier uses histograms of points, orientations, and dynamics of orientations (HPOD) features to represent characters at global charact… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 23 pages, 8 jpg figures. arXiv admin note: text overlap with arXiv:2310.08222

  4. arXiv:2310.08222  [pdf, other

    cs.CV

    Structural analysis of Hindi online handwritten characters for character recognition

    Authors: Anand Sharma, A. G. Ramakrishnan

    Abstract: Direction properties of online strokes are used to analyze them in terms of homogeneous regions or sub-strokes with points satisfying common geometric properties. Such sub-strokes are called sub-units. These properties are used to extract sub-units from Hindi ideal online characters. These properties along with some heuristics are used to extract sub-units from Hindi online handwritten characters.… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 34 pages, 36 jpg figures

  5. arXiv:2309.02067  [pdf, other

    cs.CV eess.SP

    Histograms of Points, Orientations, and Dynamics of Orientations Features for Hindi Online Handwritten Character Recognition

    Authors: Anand Sharma, A. G. Ramakrishnan

    Abstract: A set of features independent of character stroke direction and order variations is proposed for online handwritten character recognition. A method is developed that maps features like co-ordinates of points, orientations of strokes at points, and dynamics of orientations of strokes at points spatially as a function of co-ordinate values of the points and computes histograms of these features from… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 21 pages, 12 jpg figures

  6. arXiv:2109.05494  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages

    Authors: Anoop C S, Prathosh A P, A G Ramakrishnan

    Abstract: Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models lea… ▽ More

    Submitted 16 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: Submitted to ASRU 2021

  7. arXiv:2003.10433  [pdf, ps, other

    q-bio.NC cs.LG eess.SP

    Decoding Imagined Speech using Wavelet Features and Deep Neural Networks

    Authors: Jerrin Thomas Panachakel, A. G. Ramakrishnan, A. G. Ramakrishnan

    Abstract: This paper proposes a novel approach that uses deep neural networks for classifying imagined speech, significantly increasing the classification accuracy. The proposed approach employs only the EEG channels over specific areas of the brain for classification, and derives distinct feature vectors from each of those channels. This gives us more data to train a classifier, enabling us to use deep lea… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: Preprint of the paper presented in 2019 IEEE 16th India Council International Conference (INDICON). arXiv admin note: substantial text overlap with arXiv:2003.09374

  8. arXiv:2003.10212  [pdf, other

    q-bio.NC cs.AI eess.SP

    An Improved EEG Acquisition Protocol Facilitates Localized Neural Activation

    Authors: Jerrin Thomas Panachakel, Nandagopal Netrakanti Vinayak, Maanvi Nunna, A. G. Ramakrishnan, Kanishka Sharma

    Abstract: This work proposes improvements in the electroencephalogram (EEG) recording protocols for motor imagery through the introduction of actual motor movement and/or somatosensory cues. The results obtained demonstrate the advantage of requiring the subjects to perform motor actions following the trials of imagery. By introducing motor actions in the protocol, the subjects are able to perform actual mo… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Preprint of the paper presented at ComNet 2019

  9. arXiv:2003.09374  [pdf, other

    eess.SP cs.LG stat.ML

    A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG

    Authors: Jerrin Thomas Panachakel, A. G. Ramakrishnan, T. V. Ananthapadmanabha

    Abstract: The recent advances in the field of deep learning have not been fully utilised for decoding imagined speech primarily because of the unavailability of sufficient training samples to train a deep network. In this paper, we present a novel architecture that employs deep neural network (DNN) for classifying the words "in" and "cooperate" from the corresponding EEG signals in the ASU imagined speech d… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: Preprint of the paper presented at IEEE AIBEC 2019, Austria

  10. arXiv:1902.05411  [pdf, other

    cs.CV cs.LG stat.ML

    Improving Facial Emotion Recognition Systems Using Gradient and Laplacian Images

    Authors: Ram Krishna Pandey, Souvik Karmakar, A G Ramakrishnan, Nabagata Saha

    Abstract: In this work, we have proposed several enhancements to improve the performance of any facial emotion recognition (FER) system. We believe that the changes in the positions of the fiducial points and the intensities capture the crucial information regarding the emotion of a face image. We propose the use of the gradient and the Laplacian of the input image together with the original input into a co… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  11. arXiv:1812.02475  [pdf, other

    cs.CV

    Binary Document Image Super Resolution for Improved Readability and OCR Performance

    Authors: Ram Krishna Pandey, K Vignesh, A G Ramakrishnan, Chandrahasa B

    Abstract: There is a need for information retrieval from large collections of low-resolution (LR) binary document images, which can be found in digital libraries across the world, where the high-resolution (HR) counterpart is not available. This gives rise to the problem of binary document image super-resolution (BDISR). The objective of this paper is to address the interesting and challenging problem of su… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  12. arXiv:1812.02447  [pdf, other

    eess.AS cs.SD

    Pitch-synchronous DCT features: A pilot study on speaker identification

    Authors: Amit Meghanani, A G Ramakrishnan

    Abstract: We propose a new feature, namely, pitchsynchronous discrete cosine transform (PS-DCT), for the task of speaker identification. These features are obtained directly from the voiced segments of the speech signal, without any preemphasis or windowing. The feature vectors are vector quantized, to create one separate codebook for each speaker during training. The performance of the PS-DCT features is s… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  13. arXiv:1809.00961  [pdf, other

    cs.CV cs.LG stat.ML

    MSCE: An edge preserving robust loss function for improving super-resolution algorithms

    Authors: Ram Krishna Pandey, Nabagata Saha, Samarjit Karmakar, A G Ramakrishnan

    Abstract: With the recent advancement in the deep learning technologies such as CNNs and GANs, there is significant improvement in the quality of the images reconstructed by deep learning based super-resolution (SR) techniques. In this work, we propose a robust loss function based on the preservation of edges obtained by the Canny operator. This loss function, when combined with the existing loss function s… ▽ More

    Submitted 25 August, 2018; originally announced September 2018.

    Comments: Accepted in ICONIP-2018

  14. arXiv:1808.09432  [pdf, other

    eess.AS cs.SD

    Using Monte Carlo dropout for non-stationary noise reduction from speech

    Authors: Nazreen P. M., A. G. Ramakrishnan

    Abstract: In this work, we propose the use of dropout as a Bayesian estimator for increasing the generalizability of a deep neural network (DNN) for speech enhancement. By using Monte Carlo (MC) dropout, we show that the DNN performs better enhancement in unseen noise and SNR conditions. The DNN is trained on speech corrupted with Factory2, M109, Babble, Leopard and Volvo noises at SNRs of 0, 5 and 10 dB. S… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: This article draws from our previous work arXiv:1806.00516

  15. arXiv:1807.05927  [pdf, other

    cs.CV

    Computationally Efficient Approaches for Image Style Transfer

    Authors: Ram Krishna Pandey, Samarjit Karmakar, A G Ramakrishnan

    Abstract: In this work, we have investigated various style transfer approaches and (i) examined how the stylized reconstruction changes with the change of loss function and (ii) provided a computationally efficient solution for the same. We have used elegant techniques like depth-wise separable convolution in place of convolution and nearest neighbor interpolation in place of transposed convolution. Further… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

  16. arXiv:1807.05813  [pdf, other

    cs.SD eess.AS

    Subjective and objective experiments on the influence of speaker's gender on the unvoiced segments

    Authors: A Madhavaraj, T V Ananthapadmanabha, A G Ramakrishnan

    Abstract: Subjective and objective experiments are conducted to understand the extent to which a speaker's gender influences the acoustics of unvoiced (U) sounds. U segments of utterances are replaced by the corresponding segments of a speaker of opposite gender to prepare modified utterances. Humans are asked to judge if the modified utterance is spoken by one or two speakers. The experiments show that hum… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

    Comments: 2 Figures, 5 Pages

  17. arXiv:1806.00516  [pdf, other

    eess.AS cs.SD

    DNN Based Speech Enhancement for Unseen Noises Using Monte Carlo Dropout

    Authors: Nazreen P M, A G Ramakrishnan

    Abstract: In this work, we propose the use of dropouts as a Bayesian estimator for increasing the generalizability of a deep neural network (DNN) for speech enhancement. By using Monte Carlo (MC) dropout, we show that the DNN performs better enhancement in unseen noise and SNR conditions. The DNN is trained on speech corrupted with Factory2, M109, Babble, Leopard and Volvo noises at SNRs of 0, 5 and 10 dB a… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  18. arXiv:1805.09400  [pdf, other

    cs.CV

    A hybrid approach of interpolations and CNN to obtain super-resolution

    Authors: Ram Krishna Pandey, A G Ramakrishnan

    Abstract: We propose a novel architecture that learns an end-to-end map** function to improve the spatial resolution of the input natural images. The model is unique in forming a nonlinear combination of three traditional interpolation techniques using the convolutional neural network. Another proposed architecture uses a skip connection with nearest neighbor interpolation, achieving almost similar result… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Report number: TIP-19077-2018

  19. arXiv:1805.09233  [pdf, other

    cs.CV

    Segmentation of Liver Lesions with Reduced Complexity Deep Models

    Authors: Ram Krishna Pandey, Aswin Vasan, A G Ramakrishnan

    Abstract: We propose a computationally efficient architecture that learns to segment lesions from CT images of the liver. The proposed architecture uses bilinear interpolation with sub-pixel convolution at the last layer to upscale the course feature in bottle neck architecture. Since bilinear interpolation and sub-pixel convolution do not have any learnable parameter, our overall model is faster and occupi… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

  20. arXiv:1701.08835  [pdf, other

    cs.CV

    Language Independent Single Document Image Super-Resolution using CNN for improved recognition

    Authors: Ram Krishna Pandey, A G Ramakrishnan

    Abstract: Recognition of document images have important applications in restoring old and classical texts. The problem involves quality improvement before passing it to a properly trained OCR to get accurate recognition of the text. The image enhancement and quality improvement constitute important steps as subsequent recognition depends upon the quality of the input image. There are scenarios when high res… ▽ More

    Submitted 30 January, 2017; originally announced January 2017.

  21. arXiv:1609.09764  [pdf, ps, other

    cs.SD

    Adaptive dictionary based approach for background noise and speaker classification and subsequent source separation

    Authors: K V Vijay Girish, A G Ramakrishnan, T V Ananthapadmanabha

    Abstract: A judicious combination of dictionary learning methods, block sparsity and source recovery algorithm are used in a hierarchical manner to identify the noises and the speakers from a noisy conversation between two people. Conversations are simulated using speech from two speakers, each with a different background noise, with varied SNR values, down to -10 dB. Ten each of randomly chosen male and fe… ▽ More

    Submitted 28 October, 2016; v1 submitted 30 September, 2016; originally announced September 2016.

    Comments: 12 pages

  22. arXiv:1609.05104  [pdf, other

    cs.SD cs.CL

    Intrinsic normalization and extrinsic denormalization of formant data of vowels

    Authors: T. V. Ananthapadmanabha, A. G. Ramakrishnan

    Abstract: Using a known speaker-intrinsic normalization procedure, formant data are scaled by the reciprocal of the geometric mean of the first three formant frequencies. This reduces the influence of the talker but results in a distorted vowel space. The proposed speaker-extrinsic procedure re-scales the normalized values by the mean formant values of vowels. When tested on the formant data of vowels publi… ▽ More

    Submitted 10 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: 18 pages, 8 figures. Title has been revised. Appendix has been added to include more figures and to clarify 'hypothesize-test' procedure, JASA-EL, 2016

  23. arXiv:1510.07774  [pdf, ps, other

    cs.SD

    A dictionary learning and source recovery based approach to classify diverse audio sources

    Authors: K V Vijay Girish, T V Ananthapadmanabha, A G Ramakrishnan

    Abstract: A dictionary learning based audio source classification algorithm is proposed to classify a sample audio signal as one amongst a finite set of different audio sources. Cosine similarity measure is used to select the atoms during dictionary learning. Based on three objective measures proposed, namely, signal to distortion ratio (SDR), the number of non-zero weights and the sum of weights, a frame-w… ▽ More

    Submitted 27 October, 2015; originally announced October 2015.

    Comments: 5 pages, 5 figures

    ACM Class: H.5.1

  24. arXiv:1506.04828  [pdf, ps

    cs.CL cs.SD

    Significance of the levels of spectral valleys with application to front/back distinction of vowel sounds

    Authors: T. V. Ananthapadmanabha, A. G. Ramakrishnan, Shubham Sharma

    Abstract: An objective critical distance (OCD) has been defined as that spacing between adjacent formants, when the level of the valley between them reaches the mean spectral level. The measured OCD lies in the same range (viz., 3-3.5 bark) as the critical distance determined by subjective experiments for similar experimental conditions. The level of spectral valley serves a purpose similar to that of the s… ▽ More

    Submitted 5 October, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

    Comments: 39 pages, 6 figures, submitted to JASA

  25. arXiv:1411.1267  [pdf, ps, other

    cs.SD

    An Interesting Property of LPCs for Sonorant Vs Fricative Discrimination

    Authors: T. V. Ananthapadmanabha, A. G. Ramakrishnan, Pradeep Balachandran

    Abstract: Linear prediction (LP) technique estimates an optimum all-pole filter of a given order for a frame of speech signal. The coefficients of the all-pole filter, 1/A(z) are referred to as LP coefficients (LPCs). The gain of the inverse of the all-pole filter, A(z) at z = 1, i.e, at frequency = 0, A(1) corresponds to the sum of LPCs, which has the property of being lower (higher) than a threshold for t… ▽ More

    Submitted 5 November, 2014; originally announced November 2014.

    Comments: 5 pages including references

  26. arXiv:1411.0370  [pdf, ps, other

    cs.SD

    Detection of transitions between broad phonetic classes in a speech signal

    Authors: T V Ananthapadmanabha, K V Vijay Girish, A G Ramakrishnan

    Abstract: Detection of transitions between broad phonetic classes in a speech signal is an important problem which has applications such as landmark detection and segmentation. The proposed hierarchical method detects silence to non-silence transitions, high amplitude (mostly sonorants) to low ampli- tude (mostly fricatives/affricates/stop bursts) transitions and vice-versa. A subset of the extremum (minimu… ▽ More

    Submitted 3 November, 2014; originally announced November 2014.

    Comments: 12 pages, 5 figures

  27. arXiv:1407.6315  [pdf, ps, other

    cs.AI cs.LG cs.NE math.OC

    Quadratically constrained quadratic programming for classification using particle swarms and applications

    Authors: Deepak Kumar, A G Ramakrishnan

    Abstract: Particle swarm optimization is used in several combinatorial optimization problems. In this work, particle swarms are used to solve quadratic programming problems with quadratic constraints. The approach of particle swarms is an example for interior point methods in optimization as an iterative technique. This approach is novel and deals with classification problems without the use of a traditiona… ▽ More

    Submitted 23 July, 2014; originally announced July 2014.

    Comments: 17 pages, 3 figures

  28. arXiv:1407.1285  [pdf, other

    cs.ET

    Compressed EEG Acquisition with Limited Channels using Estimated Signal Correlation

    Authors: J V Satyanarayana, A G Ramakrishnan

    Abstract: Nearby scalp channels in multi-channel EEG data exhibit high correlation. A question that naturally arises is whether it is required to record signals from all the electrodes in a group of closely spaced electrodes in a typical measurement setup. One could save on the number of channels that are recorded, if it were possible to reconstruct the omitted channels to the accuracy needed for identifyin… ▽ More

    Submitted 4 July, 2014; originally announced July 2014.

  29. arXiv:1208.6137  [pdf, ps, other

    cs.CV

    Benchmarking recognition results on word image datasets

    Authors: Deepak Kumar, M N Anil Prasad, A G Ramakrishnan

    Abstract: We have benchmarked the maximum obtainable recognition accuracy on various word image datasets using manual segmentation and a currently available commercial OCR. We have developed a Matlab program, with graphical user interface, for semi-automated pixel level segmentation of word images. We discuss the advantages of pixel level annotation. We have covered five databases adding up to over 3600 wor… ▽ More

    Submitted 30 August, 2012; originally announced August 2012.

    Comments: 16 pages, 4 figures

    ACM Class: I.7; I.7.5; I.4.6; I.4.8; I.2.10