Skip to main content

Showing 1–10 of 10 results for author: Koshinaka, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.15567  [pdf, other

    eess.AS

    Generalized domain adaptation framework for parametric back-end in speaker recognition

    Authors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    Abstract: State-of-the-art speaker recognition systems comprise a speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) back-end. The effectiveness of these components relies on the availability of a large amount of labeled training data. In practice, it is common for domains (e.g., language, channel, demographic) in which a system is deployed to differ from that in whi… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  2. arXiv:2108.12128  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Task-aware War** Factors in Mask-based Speech Enhancement

    Authors: Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka, Koji Okabe, Hitoshi Yamamoto

    Abstract: This paper proposes the use of two task-aware war** factors in mask-based speech enhancement (SE). One controls the balance between speech-maintenance and noise-removal in training phases, while the other controls SE power applied to specific downstream tasks in testing phases. Our intention is to alleviate the problem that SE systems trained to improve speech quality often fail to improve other… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: EUSIPCO 2021 (the 29th European Signal Processing Conference)

  3. arXiv:2108.05679  [pdf, other

    eess.AS cs.SD

    Xi-Vector Embedding for Speaker Recognition

    Authors: Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

    Abstract: We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate. On the technology front, we offer a simple and straightforward extension to the now widely used x-vector. It consists of an auxiliary neural net predicting the frame-wise uncertainty of the input sequence. We show that the pr… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  4. arXiv:2008.08865  [pdf, other

    eess.AS cs.SD

    Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

    Authors: Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka

    Abstract: This paper presents a simple but effective method that uses multi-resolution feature maps with convolutional neural networks (CNNs) for anti-spoofing in automatic speaker verification (ASV). The central idea is to alleviate the problem that the feature maps commonly used in anti-spoofing networks are insufficient for building discriminative representations of audio segments, as they are often extr… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Odyssey 2020 (The Speaker and Language Recognition Workshop)

  5. arXiv:2008.08815  [pdf, other

    eess.AS cs.SD

    A Generalized Framework for Domain Adaptation of PLDA in Speaker Recognition

    Authors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

    Abstract: This paper proposes a generalized framework for domain adaptation of Probabilistic Linear Discriminant Analysis (PLDA) in speaker recognition. It not only includes several existing supervised and unsupervised domain adaptation methods but also makes possible more flexible usage of available data in different domains. In particular, we introduce here the two new techniques described below. (1) Corr… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: ICASSP 2020 (45th International Conference on Acoustics, Speech, and Signal Processing)

  6. arXiv:1906.08556  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

    Authors: Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

    Abstract: Speaker embeddings are continuous-value vector representations that allow easy comparison between voices of speakers with simple geometric operations. Among others, i-vector and x-vector have emerged as the mainstream methods for speaker embedding. In this paper, we illustrate the use of modern computation platform to harness the benefit of GPU acceleration for i-vector extraction. In particular,… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: Accepted to Interspeech 2019

  7. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, **g Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  8. arXiv:1812.10260  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA

    Authors: Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

    Abstract: State-of-the-art speaker recognition systems comprise an x-vector (or i-vector) speaker embedding front-end followed by a probabilistic linear discriminant analysis (PLDA) backend. The effectiveness of these components relies on the availability of a large collection of labeled training data. In practice, it is common that the domains (e.g., language, demographic) in which the system are deployed… ▽ More

    Submitted 20 April, 2020; v1 submitted 26 December, 2018; originally announced December 2018.

    Comments: 5 pages

  9. arXiv:1809.09311  [pdf, ps, other

    cs.SD eess.AS

    Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?

    Authors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Hitoshi Yamamoto, Takafumi Koshinaka

    Abstract: This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterancelevel representation is produced at the p… ▽ More

    Submitted 25 September, 2018; originally announced September 2018.

    Comments: SLT 2018 (Workshop on Spoken Language Technology)

  10. Attentive Statistics Pooling for Deep Speaker Embedding

    Authors: Koji Okabe, Takafumi Koshinaka, Koichi Shinoda

    Abstract: This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates not only weighted means but also… ▽ More

    Submitted 24 February, 2019; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: Proc. Interspeech 2018, pp2252--2256. arXiv admin note: text overlap with arXiv:1809.09311