Skip to main content

Showing 1–10 of 10 results for author: Tawara, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18972  [pdf, ps, other

    eess.AS cs.CL

    Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over

    Authors: Atsunori Ogawa, Naoyuki Kamo, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Takatomo Kano, Naohiro Tawara, Marc Delcroix

    Abstract: Large language models (LLMs) have been successfully applied for rescoring automatic speech recognition (ASR) hypotheses. However, their ability to rescore ASR hypotheses of casual conversations has not been sufficiently explored. In this study, we reveal it by performing N-best ASR hypotheses rescoring using Llama2 on the CHiME-7 distant ASR (DASR) task. Llama2 is one of the most representative LL… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 5 pages

  2. arXiv:2312.14609  [pdf, ps, other

    eess.AS cs.CL

    BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

    Authors: Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix

    Abstract: Confidence estimation, in which we estimate the reliability of each recognized token (e.g., word, sub-word, and character) in automatic speech recognition (ASR) hypotheses and detect incorrectly recognized tokens, is an important function for develo** ASR applications. In this study, we perform confidence estimation for end-to-end (E2E) ASR hypotheses. Recent E2E ASR systems show high performanc… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2021

  3. arXiv:2312.12764  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

    Authors: Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki

    Abstract: We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2022

  4. arXiv:2310.11010  [pdf, ps, other

    eess.AS cs.CL

    Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition

    Authors: Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix

    Abstract: We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR). The BLM has complementary characteristics with a forward language model (FLM), and the effectiveness of their combination has been confirmed by rescoring ASR hypotheses as post-processing. In the proposed SF, we iteratively apply the BLM to partial ASR… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP 2023

  5. arXiv:2310.02732  [pdf, ps, other

    eess.AS cs.SD

    Discriminative Training of VBx Diarization

    Authors: Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara

    Abstract: Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework fo… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  6. arXiv:2309.12656  [pdf, other

    eess.AS cs.SD

    NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

    Authors: Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa

    Abstract: This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies end-to-end neural diarization with vector clustering (EEND-VC) to each channel separately. It integrates the diarization result obtained from each channel using d… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 5 pages, 5 figures, Submitted to ICASSP 2024

  7. arXiv:2305.13580  [pdf, other

    eess.AS cs.SD

    Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

    Authors: Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki

    Abstract: Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddi… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  8. arXiv:2105.09040  [pdf, other

    eess.AS cs.SD

    Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

    Authors: Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

    Abstract: Recently, we proposed a novel speaker diarization method called End-to-End-Neural-Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and end-to-end neural network-based diarization approaches into one framework. The proposed method combines advantages of both frameworks, i.e. high diarization performance and handling of overlapped speech based on EEND, and robu… ▽ More

    Submitted 31 August, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure, Interspeech2021. (Update to include a reference to the code)

  9. arXiv:2010.13366  [pdf, other

    eess.AS cs.SD stat.ML

    Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

    Authors: Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

    Abstract: Recent diarization technologies can be categorized into two approaches, i.e., clustering and end-to-end neural approaches, which have different pros and cons. The clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. While it can be seen as a current state-of-the-art approach that works for various challenging data with reasonable r… ▽ More

    Submitted 4 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: To appear in ICASSP 2021

  10. arXiv:2001.08378  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

    Authors: Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

    Abstract: Target speech extraction, which extracts a single target source in a mixture given clues about the target speaker, has attracted increasing attention. We have recently proposed SpeakerBeam, which exploits an adaptation utterance of the target speaker to extract his/her voice characteristics that are then used to guide a neural network towards extracting speech of that speaker. SpeakerBeam presents… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: 5 pages, 3 figures. Submitted to ICASSP 2020