Skip to main content

Showing 1–8 of 8 results for author: Ambikairajah, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12236  [pdf, other

    eess.AS cs.SD eess.SP

    Binaural Selective Attention Model for Target Speaker Extraction

    Authors: Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

    Abstract: The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing. In this paper, we present a binaural time-domain Target Speaker Extraction model based on the Filter-and-Sum Network (FaSNet). Inspired by human selective hearing, our proposed model introduces target speaker embedding into separators using a multi-head… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  2. arXiv:2406.11401  [pdf, other

    eess.AS

    An Exploration of Length Generalization in Transformer-Based Speech Enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

    Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  3. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  4. arXiv:2404.06702  [pdf, other

    eess.AS cs.SD eess.SP

    What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions

    Authors: Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah

    Abstract: There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems. However, there is a dearth of analyses of what is actually learnt and the relative importance of training the different components of the front-end. In this paper, we investigate this question on keyword spotting, speech-based emotion recognition and language identification tasks an… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Interspeech 2023 Proceeding

    Journal ref: Interspeech 2023

  5. arXiv:2401.09686  [pdf, other

    eess.AS cs.SD

    An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

    Authors: Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

    Abstract: Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform… ▽ More

    Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  6. arXiv:2108.05993  [pdf, other

    eess.AS

    Joint Spatio-Temporal Discretisation of Nonlinear Active Cochlear Models

    Authors: T. Dang, V. Sethu, E. Ambikairajah, J. Epps, H. Li

    Abstract: Biologically inspired auditory models play an important role in develo** effective audio representations that can be tightly integrated into speech and audio processing systems. Current computational models of the cochlea are typically expressed in terms of systems of differential equations and do not directly lend themselves for use in computational speech processing systems. Specifically, thes… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  7. A Novel Markovian Framework for Integrating Absolute and Relative Ordinal Emotion Information

    Authors: **gyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

    Abstract: There is growing interest in affective computing for the representation and prediction of emotions along ordinal scales. However, the term ordinal emotion label has been used to refer to both absolute notions such as low or high arousal, as well as relation notions such as arousal is higher at one instance compared to another. In this paper, we introduce the terminology absolute and relative ordin… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  8. arXiv:1909.01302  [pdf, other

    cs.SD cs.NE eess.AS

    An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks

    Authors: Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah

    Abstract: Auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacousti… ▽ More

    Submitted 4 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.