Skip to main content

Showing 1–4 of 4 results for author: Jukic, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05298  [pdf, other

    eess.AS

    Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

    Authors: Ryan Langman, Ante Jukić, Kunal Dhawan, Nithin Rao Koluguri, Boris Ginsburg

    Abstract: Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS). However, the data distribution produced by such codecs is too complex for some TTS models to predict, hence requir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2406.04552  [pdf, ps, other

    eess.AS

    Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

    Authors: Ante Jukić, Jagadeesh Balam, Boris Ginsburg

    Abstract: This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator applicable to different channel counts and configurations and a multichannel filter with automatic reference selection. A transform-attend-concatenate layer is propos… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: WASPAA 2023

  3. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae ** Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  4. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae ** Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for develo** neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023