Skip to main content

Showing 1–8 of 8 results for author: Hajavi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08026  [pdf, other

    cs.SD cs.AI eess.AS

    A Study on Bias and Fairness In Deep Speaker Recognition

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: With the ubiquity of smart devices that use speaker recognition (SR) systems as a means of authenticating individuals and personalizing their services, fairness of SR systems has becomes an important point of focus. In this paper we study the notion of fairness in recent SR systems based on 3 popular and relevant definitions, namely Statistical Parity, Equalized Odds, and Equal Opportunity. We exa… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  2. arXiv:2302.02845  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Representation Learning by Distilling Video as Privileged Information

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Deep audio representation learning using multi-modal audio-visual data often leads to a better performance compared to uni-modal approaches. However, in real-world scenarios both modalities are not always available at the time of inference, leading to performance degradation by models trained for multi-modal inference. In this work, we propose a novel approach for deep audio representation learnin… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  3. arXiv:2207.10006  [pdf, other

    cs.SD eess.AS

    Fine-grained Early Frequency Attention for Deep Speaker Recognition

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Attention mechanisms have emerged as important tools that boost the performance of deep models by allowing them to focus on key parts of learned embeddings. However, current attention mechanisms used in speaker recognition tasks fail to consider fine-grained information items such as frequency bins in input spectral representations used by the deep networks. To address this issue, we propose the n… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted In IJCNN 2022

  4. arXiv:2009.13480  [pdf, other

    eess.AS cs.LG cs.SD

    Siamese Capsule Network for End-to-End Speaker Recognition In The Wild

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: We propose an end-to-end deep model for speaker verification in the wild. Our model uses thin-ResNet for extracting speaker embeddings from utterances and a Siamese capsule network and dynamic routing as the Back-end to calculate a similarity score between the embeddings. We conduct a series of experiments and comparisons on our model to state-of-the-art solutions, showing that our model outperfor… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: Submitted to ICASSP2021

  5. arXiv:2009.11394  [pdf, other

    eess.AS cs.LG cs.SD

    FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

    Authors: Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad

    Abstract: Strong presentation skills are valuable and sought-after in workplace and classroom environments alike. Of the possible improvements to vocal presentations, disfluencies and stutters in particular remain one of the most common and prominent factors of someone's demonstration. Millions of people are affected by stuttering and other speech disfluencies, with the majority of the world having experien… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: 13 pages, 6 figures

  6. arXiv:2009.01822  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Fine-grained Early Frequency Attention for Deep Speaker Representation Learning

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Deep learning techniques have considerably improved speech processing in recent years. Speaker representations extracted by deep learning models are being used in a wide range of tasks such as speaker recognition and speech emotion recognition. Attention mechanisms have started to play an important role in improving deep learning models in the field of speech processing. Nonetheless, despite the f… ▽ More

    Submitted 24 January, 2023; v1 submitted 3 September, 2020; originally announced September 2020.

  7. arXiv:1910.12590  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory

    Authors: Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad

    Abstract: Stuttering is a speech impediment affecting tens of millions of people on an everyday basis. Even with its commonality, there is minimal data and research on the identification and classification of stuttered speech. This paper tackles the problem of detection and classification of different forms of stutter. As opposed to most existing works that identify stutters with language models, our work p… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

  8. arXiv:1907.10420  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    A Deep Neural Network for Short-Segment Speaker Recognition

    Authors: Amirhossein Hajavi, Ali Etemad

    Abstract: Todays interactive devices such as smart-phone assistants and smart speakers often deal with short-duration speech segments. As a result, speaker recognition systems integrated into such devices will be much better suited with models capable of performing the recognition task with short-duration utterances. In this paper, a new deep neural network, UtterIdNet, capable of performing speaker recogni… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    Comments: Accepted in Interspeech 2019