Skip to main content

Showing 1–3 of 3 results for author: Iwamoto, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.14860  [pdf, other

    eess.AS cs.SD

    Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

    Authors: Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of single-channel SE front-ends. However, the causes of such degraded ASR performance have not been fully investigated. How to design single-channel SE f… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 13 pages, 6 figures, Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing

  2. arXiv:2311.11599  [pdf, other

    eess.AS

    How does end-to-end speech recognition training impact speech enhancement artifacts?

    Authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise a… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 5 pages, 1 figure, 1 table

  3. arXiv:2201.06685  [pdf, other

    eess.AS cs.SD

    How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR

    Authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

    Abstract: It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as t… ▽ More

    Submitted 30 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: 5 pages, 5 figures, submitted to Interspeech 2022