Skip to main content

Showing 1–3 of 3 results for author: Nanayakkara, S

Searching in archive eess. Search in all archives.
.
  1. Example-Based Framework for Perceptually Guided Audio Texture Generation

    Authors: Purnima Kamath, Chitralekha Gupta, Lonce Wyse, Suranga Nanayakkara

    Abstract: Controllable generation using StyleGANs is usually achieved by training the model using labeled data. For audio textures, however, there is currently a lack of large semantically labeled datasets. Therefore, to control generation, we develop a method for semantic control over an unconditionally trained StyleGAN in the absence of such labeled datasets. In this paper, we propose an example-based fra… ▽ More

    Submitted 14 April, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at IEEE Transactions on Audio, Speech and Language Processing

  2. arXiv:2304.11648  [pdf, other

    eess.AS cs.SD

    Towards Controllable Audio Texture Morphing

    Authors: Chitralekha Gupta, Purnima Kamath, Yize Wei, Zhuoyao Li, Suranga Nanayakkara, Lonce Wyse

    Abstract: In this paper, we propose a data-driven approach to train a Generative Adversarial Network (GAN) conditioned on "soft-labels" distilled from the penultimate layer of an audio classifier trained on a target set of audio texture classes. We demonstrate that interpolation between such conditions or control vectors provides smooth morphing between the generated audio textures, and shows similar or bet… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: accepted to ICASSP 2023

  3. arXiv:2008.06682  [pdf, other

    eess.AS cs.CL cs.SD

    Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

    Authors: Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara

    Abstract: Multimodal emotion recognition from speech is an important area in affective computing. Fusing multiple data modalities and learning representations with limited amounts of labeled data is a challenging task. In this paper, we explore the use of modality-specific "BERT-like" pretrained Self Supervised Learning (SSL) architectures to represent both speech and text modalities for the task of multimo… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020