Skip to main content

Showing 1–22 of 22 results for author: Rao, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.19426  [pdf, other

    cs.CL cs.SD eess.AS

    Deep Learning for Assessment of Oral Reading Fluency

    Authors: Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao

    Abstract: Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accu… ▽ More

    Submitted 1 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2405.09572  [pdf, other

    eess.SP cs.AI

    Deep Neural Operator Enabled Digital Twin Modeling for Additive Manufacturing

    Authors: Ning Liu, Xuxiao Li, Manoj R. Rajanna, Edward W. Reutzel, Brady Sawyer, Prahalada Rao, Jim Lua, Nam Phan, Yue Yu

    Abstract: A digital twin (DT), with the components of a physics-based model, a data-driven model, and a machine learning (ML) enabled efficient surrogate, behaves as a virtual twin of the real-world physical process. In terms of Laser Powder Bed Fusion (L-PBF) based additive manufacturing (AM), a DT can predict the current and future states of the melt pool and the resulting defects corresponding to the inp… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  3. arXiv:2306.09384  [pdf, other

    eess.AS cs.AI

    MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones

    Authors: Zitha Sasindran, Harsha Yelchuri, Pooja Rao, T. V. Prabhakar

    Abstract: We describe a comprehensive methodology for develo** user-voice personalized automatic speech recognition (ASR) models by effectively training models on mobile phones, allowing user data and models to be stored and used locally. To achieve this, we propose a resource-aware sub-model-based training approach that considers the RAM, and battery capabilities of mobile phones. By considering the eval… ▽ More

    Submitted 9 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted in AIMLSystems 2023

  4. arXiv:2209.00291  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Generating Coherent Drum Accompaniment With Fills And Improvisations

    Authors: Rishabh Dahale, Vaibhav Talwadker, Preeti Rao, Prateek Verma

    Abstract: Creating a complex work of art like music necessitates profound creativity. With recent advancements in deep learning and powerful models such as transformers, there has been huge progress in automatic music generation. In an accompaniment generation context, creating a coherent drum pattern with apposite fills and improvisations at proper locations in a song is a challenging task even for an expe… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: 8 pages, 7 figures, 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), Bengaluru, India

  5. arXiv:2204.03166  [pdf

    eess.AS cs.SD

    Musical Information Extraction from the Singing Voice

    Authors: Preeti Rao

    Abstract: Music information retrieval is currently an active research area that addresses the extraction of musically important information from audio signals, and the applications of such information. The extracted information can be used for search and retrieval of music in recommendation systems, or to aid musicological studies or even in music learning. Sophisticated signal processing techniques are app… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  6. arXiv:2203.06583  [pdf

    cs.SD cs.AI eess.AS

    Bi-Sampling Approach to Classify Music Mood leveraging Raga-Rasa Association in Indian Classical Music

    Authors: Mohan Rao B C, Vinayak Arkachaari, Harsha M N, Sushmitha M N, Gayathri Ramesh K K, Ullas M S, Pathi Mohan Rao, Sudha G, Narayana Darapaneni

    Abstract: The impact of Music on the mood or emotion of the listener is a well-researched area in human psychology and behavioral science. In Indian classical music, ragas are the melodic structure that defines the various styles and forms of the music. Each raga has been found to evoke a specific emotion in the listener. With the advent of advanced capabilities of audio signal processing and the applicatio… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

  7. arXiv:2112.03871  [pdf, ps, other

    eess.AS cs.SD

    Training end-to-end speech-to-text models on mobile phones

    Authors: Zitha S, Raghavendra Rao Suresh, Pooja Rao, T. V. Prabhakar

    Abstract: Training the state-of-the-art speech-to-text (STT) models in mobile devices is challenging due to its limited resources relative to a server environment. In addition, these models are trained on generic datasets that are not exhaustive in capturing user-specific characteristics. Recently, on-device personalization techniques have been making strides in mitigating the problem. Although many current… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  8. arXiv:2112.00635  [pdf, other

    eess.AS cs.SD eess.SP

    Predicting lexical skills from oral reading with acoustic measures

    Authors: Charvi Vitthal, Shreeharsha B S, Kamini Sabu, Preeti Rao

    Abstract: Literacy assessment is an important activity for education administrators across the globe. Typically achieved in a school setting by testing a child's oral reading, it is intensive in human resources. While automatic speech recognition (ASR) is a potential solution to the problem, it tends to be computationally expensive for hand-held devices apart from needing language and accent-specific speech… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  9. arXiv:2110.14273  [pdf, other

    cs.CL cs.SD eess.AS

    Deep Learning For Prominence Detection In Children's Read Speech

    Authors: Mithilesh Vaidya, Kamini Sabu, Preeti Rao

    Abstract: The detection of perceived prominence in speech has attracted approaches ranging from the design of linguistic knowledge-based acoustic features to the automatic feature learning from suprasegmental attributes such as pitch and intensity contours. We present here, in contrast, a system that operates directly on segmented speech waveforms to learn features relevant to prominent word detection for c… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Under review at ICASSP 2022. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  10. arXiv:2109.12434  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE eess.SY

    Emergent behavior and neural dynamics in artificial agents tracking turbulent plumes

    Authors: Satpreet Harcharan Singh, Floris van Breugel, Rajesh P. N. Rao, Bingni Wen Brunton

    Abstract: Tracking a turbulent plume to locate its source is a complex control problem because it requires multi-sensory integration and must be robust to intermittent odors, changing wind direction, and variable plume statistics. This task is routinely performed by flying insects, often over long distances, in pursuit of food or mates. Several aspects of this remarkable behavior have been studied in detail… ▽ More

    Submitted 17 December, 2021; v1 submitted 25 September, 2021; originally announced September 2021.

    ACM Class: I.2.6; I.2.0; I.5.1

  11. arXiv:2104.09064  [pdf, other

    eess.AS cs.SD

    Automatic Stroke Classification of Tabla Accompaniment in Hindustani Vocal Concert Audio

    Authors: Rohit M. A., Preeti Rao

    Abstract: The tabla is a unique percussion instrument due to the combined harmonic and percussive nature of its timbre, and the contrasting harmonic frequency ranges of its two drums. This allows a tabla player to uniquely emphasize parts of the rhythmic cycle (theka) in order to mark the salient positions. An analysis of the loudness dynamics and timing deviations at various cycle positions is an important… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: To appear in the JOURNAL OF ACOUSTICAL SOCIETY OF INDIA, April 2021

  12. arXiv:2104.05488  [pdf, other

    cs.CL cs.SD eess.AS

    CNN Encoding of Acoustic Parameters for Prominence Detection

    Authors: Kamini Sabu, Mithilesh Vaidya, Preeti Rao

    Abstract: Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexi… ▽ More

    Submitted 27 January, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, 6 tables, Submitted to INTERSPEECH 2021

  13. arXiv:2103.04346  [pdf, other

    eess.AS cs.SD

    An Optimized Signal Processing Pipeline for Syllable Detection and Speech Rate Estimation

    Authors: Kamini Sabu, Syomantak Chaudhuri, Preeti Rao, Mahesh Patil

    Abstract: Syllable detection is an important speech analysis task with applications in speech rate estimation, word segmentation, and automatic prosody detection. Based on the well understood acoustic correlates of speech articulation, it has been realized by local peak picking on a frequency-weighted energy contour that represents vowel sonority. While several of the analysis parameters are set based on kn… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Comments: 6 pages, 3 figures, accepted in National Conference on Communications (NCC) 2020

  14. arXiv:2008.08405  [pdf, other

    eess.AS cs.LG cs.SD

    HpRNet : Incorporating Residual Noise Modeling for Violin in a Variational Parametric Synthesizer

    Authors: Krishna Subramani, Preeti Rao

    Abstract: Generative Models for Audio Synthesis have been gaining momentum in the last few years. More recently, parametric representations of the audio signal have been incorporated to facilitate better musical control of the synthesized output. In this work, we investigate a parametric model for violin tones, in particular the generative modeling of the residual bow noise to make for more natural tone qua… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: https://github.com/SubramaniKrishna/HpRNet

  15. arXiv:2008.00756  [pdf, other

    eess.AS cs.IR cs.LG

    Structure and Automatic Segmentation of Dhrupad Vocal Bandish Audio

    Authors: Rohit M. A., Preeti Rao

    Abstract: A Dhrupad vocal concert comprises a composition section that is interspersed with improvised episodes of increased rhythmic activity involving the interaction between the vocals and the percussion. Tracking the changing rhythmic density, in relation to the underlying metric tempo of the piece, thus facilitates the detection and labeling of the improvised sections in the concert structure. This wor… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Part of this work published in ISMIR 2020

  16. VaPar Synth -- A Variational Parametric Model for Audio Synthesis

    Authors: Krishna Subramani, Preeti Rao, Alexandre D'Hooge

    Abstract: With the advent of data-driven statistical modeling and abundant computing power, researchers are turning increasingly to deep learning for audio synthesis. These methods try to model audio signals directly in the time or frequency domain. In the interest of more flexible control over the generated sound, it could be more useful to work with a parametric representation of the signal which correspo… ▽ More

    Submitted 30 March, 2020; originally announced April 2020.

    Comments: https://github.com/SubramaniKrishna/VaPar-Synth , Accepted in ICASSP 2020

  17. arXiv:2002.06595  [pdf, other

    eess.AS cs.LG cs.SD

    Speech-to-Singing Conversion in an Encoder-Decoder Framework

    Authors: Jayneel Parekh, Preeti Rao, Yi-Hsuan Yang

    Abstract: In this paper our goal is to convert a set of spoken lines into sung ones. Unlike previous signal processing based methods, we take a learning based approach to the problem. This allows us to automatically model various aspects of this transformation, thus overcoming dependence on specific inputs such as high quality singing templates or phoneme-score synchronization information. Specifically, we… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

    Comments: Accepted at IEEE ICASSP 2020

  18. arXiv:2001.08349  [pdf, other

    q-bio.NC cs.CV eess.IV

    Investigating naturalistic hand movements by behavior mining in long-term video and neural recordings

    Authors: Satpreet H. Singh, Steven M. Peterson, Rajesh P. N. Rao, Bingni W. Brunton

    Abstract: Recent technological advances in brain recording and artificial intelligence are propelling a new paradigm in neuroscience beyond the traditional controlled experiment. Rather than focusing on cued, repeated trials, naturalistic neuroscience studies neural processes underlying spontaneous behaviors performed in unconstrained settings. However, analyzing such unstructured data lacking a priori expe… ▽ More

    Submitted 19 June, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

  19. arXiv:1911.08335  [pdf, other

    eess.AS cs.LG cs.SD

    Generative Audio Synthesis with a Parametric Model

    Authors: Krishna Subramani, Alexandre D'Hooge, Preeti Rao

    Abstract: Use a parametric representation of audio to train a generative model in the interest of obtaining more flexible control over the generated sound.

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: ISMIR 2019 Late Breaking/Demo

  20. arXiv:1906.08916  [pdf, other

    cs.SD eess.AS

    Understanding and Classifying Cultural Music Using Melodic Features Case Of Hindustani, Carnatic And Turkish Music

    Authors: Amruta Vidwans, Prateek Verma, Preeti Rao

    Abstract: We present a melody based classification of musical styles by exploiting the pitch and energy based characteristics derived from the audio signal. Three prominent musical styles were chosen which have improvisation as integral part with similar melodic principles, theme, and structure of concerts namely, Hindustani, Carnatic and Turkish music. Listeners of one or more of these genres can discrimin… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: The work appeared in the 3rd CompMusic Workshop for Develo** Computational models for the Discovery of the Worlds Music held at IIT Madras at Chennai in 2013

  21. arXiv:1904.03710  [pdf, other

    cs.CV eess.IV

    Planar Geometry and Image Recovery from Motion-Blur

    Authors: Kuldeep Purohit, Subeesh Vasu, M. Purnachandra Rao, A. N. Rajagopalan

    Abstract: Existing works on motion deblurring either ignore the effects of depth-dependent blur or work with the assumption of a multi-layered scene wherein each layer is modeled in the form of fronto-parallel plane. In this work, we consider the case of 3D scenes with piecewise planar structure i.e., a scene that can be modeled as a combination of multiple planes with arbitrary orientations. We first propo… ▽ More

    Submitted 6 February, 2022; v1 submitted 7 April, 2019; originally announced April 2019.

  22. arXiv:1807.11138  [pdf, other

    cs.SD eess.AS

    Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks

    Authors: Amruta Vidwans, Nachiket Deo, Preeti Rao

    Abstract: We investigate methods for the automatic labeling of the taan section, a prominent structural component of the Hindustani Khayal vocal concert. The taan contains improvised raga-based melody rendered in the highly distinctive style of rapid pitch and energy modulations of the voice. We propose computational features that capture these specific high-level characteristics of the singing voice in the… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

    Comments: This work was done in 2015 at Indian Institute of Technology, Bombay, as a part of the ERC grant agreement 267583 (CompMusic) project