Skip to main content

Showing 1–50 of 74 results for author: Kinnunen, T

.
  1. arXiv:2406.17246  [pdf, other

    cs.SD cs.AI eess.AS

    Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

    Authors: Hye-** Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen

    Abstract: Current trends in audio anti-spoofing detection research strive to improve models' ability to generalize across unseen attacks by learning to identify a variety of spoofing artifacts. This emphasis has primarily focused on the spoof class. Recently, several studies have noted that the distribution of silence differs between the two classes, which can serve as a shortcut. In this paper, we extend c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 5 tables

  2. arXiv:2406.10836  [pdf, other

    eess.AS cs.SD

    Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

    Authors: Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi

    Abstract: Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024 Accepted. https://github.com/nii-yamagishilab/SpeechSPC-mini

  3. arXiv:2406.09999  [pdf, other

    eess.AS

    ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

    Authors: Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md. Sahidullah, Tomi Kinnunen

    Abstract: While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fix… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted: Interspeech 2024

    Journal ref: Interspeech 2024

  4. arXiv:2403.01355  [pdf, ps, other

    eess.AS cs.LG

    a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

    Authors: Hye-** Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot

    Abstract: Spoofing detection is today a mainstream research topic. Standard metrics can be applied to evaluate the performance of isolated spoofing detection solutions and others have been proposed to support their evaluation when they are combined with speaker detection. These either have well-known deficiencies or restrict the architectural approach to combine speaker and spoof detectors. In this paper, w… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 8 pages, submitted to Speaker Odyssey 2024

  5. arXiv:2402.15214  [pdf, other

    eess.AS cs.SD

    ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

    Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

    Abstract: The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: The following article has been accepted by The Journal of the Acoustical Society of America (JASA). After it is published, it will be found at https://pubs.aip.org/asa/jasa

  6. arXiv:2401.11156  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

    Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

    Abstract: It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module to classify speech input either as a bonafide, or a spoofed utterance. Nevertheless, such a design requires additional computation and utilization effo… ▽ More

    Submitted 27 January, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (doi updated)

  7. arXiv:2309.12237  [pdf, other

    cs.CR cs.LG cs.SD eess.AS eess.IV stat.CO

    t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators

    Authors: Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch

    Abstract: Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. W… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence. For associated codes, see https://github.com/TakHemlata/T-EER (Github) and https://colab.research.google.com/drive/1ga7eiKFP11wOFMuZjThLJlkBcwEG6_4m?usp=sharing (Google Colab)

  8. arXiv:2306.07501  [pdf, other

    eess.AS cs.SD

    Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech

    Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

    Abstract: In this paper, we study the impact of the ageing on modern deep speaker embedding based automatic speaker verification (ASV) systems. We have selected two different datasets to examine ageing on the state-of-the-art ECAPA-TDNN system. The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb. The… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: Interspeech 2023

  9. arXiv:2306.00044  [pdf, ps, other

    cs.LG cs.CR cs.SD eess.AS

    How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning

    Authors: Hye-** Shim, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

    Abstract: Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e.g., deep neural networks) learns spurious correlations present in data, resulting in biased models. We focus on finding shortcuts in deep learning based spoofing countermeasures (CMs) that predict whether a given utterance is spoofed or not. While prior work has addressed specific data artifacts, such as sile… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Interspeech 2023

  10. arXiv:2305.19953  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing

    Authors: Hye-** Shim, Jee-weon Jung, Tomi Kinnunen

    Abstract: Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks. Although state-of-the-art spoofing countermeasure(CM) models perform well on specific datasets, they lack generalization when evaluated with different datasets. To address this limitation, previous studies have explored large pre-trained models, which require significant resources and… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  11. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-** Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  12. arXiv:2303.01126  [pdf, other

    cs.SD cs.CR eess.AS

    Speaker-Aware Anti-Spoofing

    Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

    Abstract: We address speaker-aware anti-spoofing, where prior knowledge of the target speaker is incorporated into a voice spoofing countermeasure (CM). In contrast to the frequently used speaker-independent solutions, we train the CM in a speaker-conditioned way. As a proof of concept, we consider speaker-aware extension to the state-of-the-art AASIST (audio anti-spoofing using integrated spectro-temporal… ▽ More

    Submitted 8 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  13. arXiv:2303.01125  [pdf, other

    cs.SD cs.LG eess.AS

    Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments. Our research focuses on addressing this limitation through the development of small footprint deep speaker embedding extraction using knowledge di… ▽ More

    Submitted 19 December, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Submitted to Data & Knowledge Engineering at Dec. 2023. Copyright may be transferred without notice

  14. arXiv:2302.10014  [pdf, other

    eess.AS

    Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

    Authors: Mark Anderson, Tomi Kinnunen, Naomi Harte

    Abstract: While much of modern speech and audio processing relies on deep neural networks trained using fixed audio representations, recent studies suggest great potential in acoustic frontends learnt jointly with a backend. In this study, we focus specifically on learnable filterbanks. Prior studies have reported that in frontends using learnable filterbanks initialised to a mel scale, the learned filters… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023, 5 pages, 2 figures, 2 tables

  15. arXiv:2211.01091  [pdf, ps, other

    eess.AS cs.AI cs.SD

    I4U System Description for NIST SRE'20 CTS Challenge

    Authors: Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang , et al. (1 additional authors not shown)

    Abstract: This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I$^2$R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (C… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker Recognition Challenge, 14-12 December 2021

  16. arXiv:2210.02437  [pdf, other

    cs.SD cs.CR cs.MM eess.AS

    ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

    Authors: Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee

    Abstract: Benchmarking initiatives support the meaningful comparison of competing solutions to prominent problems in speech and language processing. Successive benchmarking evaluations typically reflect a progressive evolution from ideal lab conditions towards to those encountered in the wild. ASVspoof, the spoofing and deepfake detection initiative and challenge series, has followed the same trend. This ar… ▽ More

    Submitted 22 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  17. arXiv:2209.10479  [pdf, other

    eess.AS cs.SD eess.SP

    An Initial study on Birdsong Re-synthesis Using Neural Vocoders

    Authors: Rhythm Bhatia, Tomi H. Kinnunen

    Abstract: Modern speech synthesis uses neural vocoders to model raw waveform samples directly. This increased versatility has expanded the scope of vocoders from speech to other domains, such as music. We address another interesting domain of bio-acoustics. We provide initial comparative analysis-resynthesis experiments of birdsong using traditional (WORLD) and two neural (WaveNet autoencoder, parallel Wave… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: To appear in 24th International Conference on Speech and Computer (SPECOM), GURUGRAM, INDIA

  18. arXiv:2205.07060  [pdf, other

    cs.AI cs.CR cs.LG

    GAN-Aimbots: Using Machine Learning for Cheating in First Person Shooters

    Authors: Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

    Abstract: Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating. Both traditional software-based methods and statistical systems have been successful in protecting against cheating, but recent advances in the automa… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Transactions on Games. Source code available at https://github.com/miffyli/gan-aimbots

  19. arXiv:2205.04923  [pdf, other

    cs.SD eess.AS

    Gamified Speaker Comparison by Listening

    Authors: Sandip Ghimire, Tomi Kinnunen, Rosa Gonzalez Hautamäki

    Abstract: We address speaker comparison by listening in a game-like environment, hypothesized to make the task more motivating for naive listeners. We present the same 30 trials selected with the help of an x-vector speaker recognition system from VoxCeleb to a total of 150 crowdworkers recruited through Amazon's Mechanical Turk. They are divided into cohorts of 50, each using one of three alternative inter… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to Odyssey 2022 The Speaker and Language Recognition Workshop

  20. arXiv:2205.00288  [pdf, other

    eess.AS cs.SD

    Baselines and Protocols for Household Speaker Recognition

    Authors: Alexey Sholokhov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: Speaker recognition on household devices, such as smart speakers, features several challenges: (i) robustness across a vast number of heterogeneous domains (households), (ii) short utterances, (iii) possibly absent speaker labels of the enrollment data (passive enrollment), and (iv) presence of unknown persons (guests). While many commercial products exist, there is less published research and no… ▽ More

    Submitted 5 May, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

    Comments: Accepted to Odyssey 2022

  21. arXiv:2204.09976  [pdf, other

    cs.SD eess.AS

    Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion

    Authors: Hye-** Shim, Hemlata Tak, Xuechen Liu, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung, Soo-Whan Chung, Ha-** Yu, Bong-** Lee, Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Tomi Kinnunen, Nicholas Evans

    Abstract: Deep learning has brought impressive progress in the study of both automatic speaker verification (ASV) and spoofing countermeasures (CM). Although solutions are mutually dependent, they have typically evolved as standalone sub-systems whereby CM solutions are usually designed for a fixed ASV system. The work reported in this paper aims to gauge the improvements in reliability that can be gained f… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: 8 pages, accepted by Odyssey 2022

  22. Improving speaker de-identification with functional data analysis of f0 trajectories

    Authors: Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki

    Abstract: Due to a constantly increasing amount of speech data that is stored in different types of databases, voice privacy has become a major concern. To respond to such concern, speech researchers have developed various methods for speaker de-identification. The state-of-the-art solutions utilize deep learning solutions which can be effective but might be unavailable or impractical to apply for, for exam… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to Speech Communication. March 2022

  23. arXiv:2203.14732  [pdf, other

    eess.AS

    SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

    Authors: Jee-weon Jung, Hemlata Tak, Hye-** Shim, Hee-Soo Heo, Bong-** Lee, Soo-Whan Chung, Ha-** Yu, Nicholas Evans, Tomi Kinnunen

    Abstract: The first spoofing-aware speaker verification (SASV) challenge aims to integrate research efforts in speaker verification and anti-spoofing. We extend the speaker verification scenario by introducing spoofed trials to the usual set of target and impostor trials. In contrast to the established ASVspoof challenge where the focus is upon separate, independently optimised spoofing detection and speake… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, 2 tables, submitted to Interspeech 2022 as a conference paper

  24. arXiv:2203.10992  [pdf, other

    cs.SD cs.AI eess.AS

    Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module. We start from the standard ASV framework of the ASVspoof 2019 baseline and approach the problem from the back-end classifier based on probabilistic linear discriminant analysis. We employ three unsupervised… ▽ More

    Submitted 26 April, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted by Speaker Odyssey 2022

  25. arXiv:2202.05236  [pdf, other

    cs.SD cs.AI eess.AS

    Learnable Nonlinear Compression for Robust Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, a… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP2022

  26. arXiv:2201.10283  [pdf, ps, other

    cs.SD cs.CR eess.AS

    SASV Challenge 2022: A Spoofing Aware Speaker Verification Challenge Evaluation Plan

    Authors: Jee-weon Jung, Hemlata Tak, Hye-** Shim, Hee-Soo Heo, Bong-** Lee, Soo-Whan Chung, Hong-Goo Kang, Ha-** Yu, Nicholas Evans, Tomi Kinnunen

    Abstract: ASV (automatic speaker verification) systems are intrinsically required to reject both non-target (e.g., voice uttered by different speaker) and spoofed (e.g., synthesised or converted) inputs. However, there is little consideration for how ASV systems themselves should be adapted when they are expected to encounter spoofing attacks, nor when they operate in tandem with CMs (spoofing countermeasur… ▽ More

    Submitted 2 March, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Evaluation plan of the SASV Challenge 2022. See this webpage for more information: https://sasv-challenge.github.io

  27. arXiv:2201.09709  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

    Authors: Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

    Abstract: As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Published version available at: https://ieeexplore.ieee.org/document/9664367

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022

  28. arXiv:2110.10983  [pdf, other

    cs.SD cs.AI eess.AS

    Optimizing Multi-Taper Features for Deep Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with d… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: To appear in IEEE Signal Processing Letters

  29. arXiv:2109.13510  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    VoxCeleb Enrichment for Age and Gender Recognition

    Authors: Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, Tomi Kinnunen

    Abstract: VoxCeleb datasets are widely used in speaker recognition studies. Our work serves two purposes. First, we provide speaker age labels and (an alternative) annotation of speaker gender. Second, we demonstrate the use of this metadata by constructing age and gender recognition models with different features and classifiers. We query different celebrity databases and apply consensus rules to derive ag… ▽ More

    Submitted 20 December, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted for presentation at ASRU 2021; repository: https://github.com/hechmik/voxceleb_enrichment_age_gender

  30. arXiv:2109.12058  [pdf, other

    cs.SD cs.AI eess.AS

    Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might sup… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Accepted for publication at ASRU 2021

  31. arXiv:2109.12056  [pdf, other

    cs.SD cs.AI eess.AS

    Parameterized Channel Normalization for Far-field Deep Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e.g. room reverberation) and noise. To mitigate these effects, we focus on two parametric normalization methods: per-channel energy normalization (PCEN) and parameterized cepstral mean normalization (PCMN).… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Accepted for publication at ASRU 2021

  32. arXiv:2109.00537  [pdf, other

    eess.AS cs.CR cs.LG cs.SD

    ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

    Authors: Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

    Abstract: ASVspoof 2021 is the forth edition in the series of bi-annual challenges which aim to promote the study of spoofing and the design of countermeasures to protect automatic speaker verification systems from manipulation. In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task in… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Accepted to the ASVspoof 2021 Workshop

  33. arXiv:2109.00535  [pdf, other

    eess.AS cs.CR cs.LG cs.SD

    ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

    Authors: Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

    Abstract: The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures. ASVspoof 2021 is the 4th in a series of bi-annual, competitive challenges where the goal is to develop countermeasures capable of discriminating between bona fide and spoofed or deepfake… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: http://www.asvspoof.org

  34. arXiv:2109.00281  [pdf, other

    cs.CR cs.SD eess.AS

    Benchmarking and challenges in security and privacy for voice biometrics

    Authors: Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi

    Abstract: For many decades, research in speech technologies has focused upon improving reliability. With this now meeting user expectations for a range of diverse applications, speech technology is today omni-present. As result, a focus on security and privacy has now come to the fore. Here, the research effort is in its relative infancy and progress calls for greater, multidisciplinary collaboration with s… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Submitted to the symposium of the ISCA Security & Privacy in Speech Communications (SPSC) special interest group

  35. arXiv:2106.06362  [pdf, other

    cs.SD cs.LG eess.AS stat.AP

    Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

    Authors: Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

    Abstract: Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity. We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers in response to a common dataset. Based upon rank cor… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. Example code available at https://github.com/asvspoof-challenge/classifier-adjacency

  36. arXiv:2103.14602  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Data Quality as Predictor of Voice Anti-Spoofing Generalization

    Authors: Bhusan Chettri, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

    Abstract: Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know \emph{why}. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing perfor… ▽ More

    Submitted 21 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: INTERSPEECH 2021

  37. arXiv:2102.10322  [pdf, other

    cs.SD cs.LG eess.AS

    Learnable MFCCs for Speaker Verification

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven versions of the four linear transforms of a standard MFCC extracto… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

    Comments: Accepted to ISCAS 2021

  38. arXiv:2102.05889  [pdf, other

    eess.AS cs.CR cs.SD

    ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech

    Authors: Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee

    Abstract: The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV). This paper describes the third in a series of bi-annual challenges: ASVspoof 2019. With the challenge database and protocols being described elsewhere, the focus of this paper is on results and the top performing single and ensemble system submissions from 62 teams, all of which o… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

    Journal ref: IEEE Transactions on Biometrics, Behavior, and Identity Science 2021

  39. arXiv:2012.01244  [pdf, other

    cs.AI cs.NE

    General Characterization of Agents by States they Visit

    Authors: Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

    Abstract: Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update. However, previously presented solutions are not applicable in general, either due to lack of expressive power, computational constraint… ▽ More

    Submitted 28 October, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Deep Reinforcement Learning Workshop, NeurIPS 2021

  40. arXiv:2009.03554  [pdf, other

    eess.AS cs.SD

    Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

    Authors: Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhenhua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda

    Abstract: The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary evaluation of the challenge submissions was done through crowd-sourced listening tests, we also performed an objective assessment of the submitted systems. The aim of the objective assessment is to provide complementary perf… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  41. arXiv:2008.12527  [pdf, other

    eess.AS cs.SD

    Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

    Authors: Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhenhua Ling, Tomoki Toda

    Abstract: The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, includ… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  42. arXiv:2008.04578  [pdf, other

    eess.AS cs.CY cs.SD

    Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data

    Authors: Rosa González Hautamäki, Tomi Kinnunen

    Abstract: Modern automatic speaker verification (ASV) relies heavily on machine learning implemented through deep neural networks. It can be difficult to interpret the output of these black boxes. In line with interpretative machine learning, we model the dependency of ASV detection score upon acoustic mismatch of the enrollment and test utterances. We aim to identify mismatch factors that explain target sp… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

  43. arXiv:2008.03590  [pdf, other

    eess.AS cs.LG stat.ML

    Extrapolating false alarm rates in automatic speaker verification

    Authors: Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

    Abstract: Automatic speaker verification (ASV) vendors and corpus providers would both benefit from tools to reliably extrapolate performance metrics for large speaker populations without collecting new speakers. We address false alarm rate extrapolation under a worst-case model whereby an adversary identifies the closest impostor for a given target speaker from a large population. Our models are generative… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: Accepted for publication to Interspeech 2020

  44. arXiv:2007.15283  [pdf, other

    eess.AS cs.LG cs.SD

    A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

    Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

    Abstract: Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature e… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted to Interspeech 2020

  45. arXiv:2007.13118  [pdf, other

    eess.AS cs.CV cs.SD

    UIAI System for Short-Duration Speaker Verification Challenge 2020

    Authors: Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

    Abstract: In this work, we present the system description of the UIAI entry for the short-duration speaker verification (SdSV) challenge 2020. Our focus is on Task 1 dedicated to text-dependent speaker verification. We investigate different feature extraction and modeling approaches for automatic speaker verification (ASV) and utterance verification (UV). We have also studied different fusion strategies for… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

  46. arXiv:2007.05979  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

    Authors: Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

    Abstract: Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs. The reliability of spoofing CMs is typically gauged using the equal error rate (EER) metric. The primitive EER fails to reflect application requirements and the impact of spoofing and CMs upon ASV and its u… ▽ More

    Submitted 25 August, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (doi updated)

  47. arXiv:2004.08849  [pdf, other

    eess.AS cs.CR

    The Attacker's Perspective on Automatic Speaker Verification: An Overview

    Authors: Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li

    Abstract: Security of automatic speaker verification (ASV) systems is compromised by various spoofing attacks. While many types of non-proactive attacks (and their defenses) have been studied in the past, attacker's perspective on ASV, represents a far less explored direction. It can potentially help to identify the weakest parts of ASV systems and be used to develop attacker-aware systems. We present an ov… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: 5 pages, 1 figure, Submitted to Interspeech 2020

  48. arXiv:2004.01922  [pdf, other

    eess.AS

    Subband modeling for spoofing detection in automatic speaker verification

    Authors: Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos

    Abstract: Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection o… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: Accepted to the Speaker Odyssey (The Speaker and Language Recognition Workshop) 2020 conference. 8 pages

  49. arXiv:2004.01559  [pdf, other

    eess.AS cs.LG

    Neural i-vectors

    Authors: Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen

    Abstract: Deep speaker embeddings have been demonstrated to outperform their generative counterparts, i-vectors, in recent speaker verification evaluations. To combine the benefits of high performance and generative interpretation, we investigate the use of deep embedding extractor and i-vector extractor in succession. To bundle the deep embedding extractor with an i-vector extractor, we adopt aggregation l… ▽ More

    Submitted 18 April, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Accepted to Odyssey 2020: The Speaker and Language Recognition Workshop. Version 2 (bugfix)

  50. arXiv:2003.09542  [pdf, other

    eess.AS

    Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification

    Authors: Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos

    Abstract: Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a hu… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: Accepted to Computer Speech and Language Special issue on Advances in Automatic Speaker Verification Anti-spoofing, 2020