Skip to main content

Showing 1–10 of 10 results for author: Aldeneh, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.00340  [pdf, other

    cs.SD eess.AS

    Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

    Authors: Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald

    Abstract: Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-bank features as inputs, and thus, training them on top of self-supervised features assumes that both feature types require the same amount of learning for the task. In this work, we observe that pre-trained self-supervised spe… ▽ More

    Submitted 13 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2401.17230  [pdf, other

    cs.SD cs.AI eess.AS

    ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

    Authors: Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe

    Abstract: This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We provide several models, ranging from x-vector to recent SKA-TDNN. Through the modularized architecture design, variants can be developed easily. We also… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, 7 tables, Interspeech 2024

  3. arXiv:2308.09514  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

    Authors: Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer

    Abstract: We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with 200k+ simulate… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of INTERSPEECH (2023), pp. 3724-3728

  4. arXiv:2210.14800  [pdf, other

    eess.AS cs.HC cs.SD

    Naturalistic Head Motion Generation from Speech

    Authors: Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald

    Abstract: Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the varia… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  5. arXiv:2203.10117  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    On the role of Lip Articulation in Visual Speech Perception

    Authors: Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald

    Abstract: Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understandin… ▽ More

    Submitted 10 November, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Submitted to ICASSP 2023

  6. Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts

    Authors: Matthew Perez, Zakaria Aldeneh, Emily Mower Provost

    Abstract: Robust speech recognition is a key prerequisite for semantic feature extraction in automatic aphasic speech analysis. However, standard one-size-fits-all automatic speech recognition models perform poorly when applied to aphasic speech. One reason for this is the wide range of speech intelligibility due to different levels of severity (i.e., higher severity lends itself to less intelligible speech… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: 4 pages

  7. arXiv:2004.12031  [pdf, ps, other

    cs.LG cs.CL cs.CV cs.SD eess.AS

    On the Role of Visual Cues in Audiovisual Speech Enhancement

    Authors: Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz

    Abstract: We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of… ▽ More

    Submitted 25 February, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: ICASSP 2021

  8. arXiv:1910.05115  [pdf, ps, other

    eess.AS cs.SD q-bio.NC

    Identifying Mood Episodes Using Dialogue Features from Clinical Interviews

    Authors: Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin McInnis, Emily Mower Provost

    Abstract: Bipolar disorder, a severe chronic mental illness characterized by pathological mood swings from depression to mania, requires ongoing symptom severity tracking to both guide and measure treatments that are critical for maintaining long-term health. Mental health professionals assess symptom severity through semi-structured clinical interviews. During these interviews, they observe their patients'… ▽ More

    Submitted 24 March, 2022; v1 submitted 28 September, 2019; originally announced October 2019.

  9. arXiv:1908.08979  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

    Authors: Mimansa Jaiswal, Zakaria Aldeneh, Emily Mower Provost

    Abstract: Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions natural… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: 10 pages, ICMI 2019

  10. arXiv:1903.11672  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

    Authors: Mimansa Jaiswal, Zakaria Aldeneh, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost

    Abstract: Emotion recognition algorithms rely on data annotated with high quality labels. However, emotion expression and perception are inherently subjective. There is generally not a single annotation that can be unambiguously declared "correct". As a result, annotations are colored by the manner in which they were collected. In this paper, we conduct crowdsourcing experiments to investigate this impact o… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

    Comments: 5 pages, ICASSP 2019