Skip to main content

Showing 1–50 of 163 results for author: Yamagishi, J

.
  1. arXiv:2406.10836  [pdf, other

    eess.AS cs.SD

    Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

    Authors: Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi

    Abstract: Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024 Accepted. https://github.com/nii-yamagishilab/SpeechSPC-mini

  2. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2406.08812  [pdf, other

    cs.SD eess.AS

    Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

    Authors: Zhengyang Chen, Xuechen Liu, Erica Cooper, Junichi Yamagishi, Yanmin Qian

    Abstract: This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilizes listener impressions to construct prompts, which are easier to collect and align more naturally with everyday descriptions of speaker traits. We ado… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024 (with more analysis in the final Appendix part)

  4. arXiv:2406.07845  [pdf, other

    eess.AS cs.SD

    Target Speaker Extraction with Curriculum Learning

    Authors: Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi

    Abstract: This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering spe… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  5. arXiv:2406.07816  [pdf, other

    eess.AS cs.CL cs.SD

    Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio

    Authors: Lin Zhang, Xin Wang, Erica Cooper, Mireia Diez, Federico Landini, Nicholas Evans, Junichi Yamagishi

    Abstract: This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Counte… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-** Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  7. arXiv:2405.00355  [pdf, other

    cs.CV

    Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis

    Authors: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

    Abstract: This paper investigates the effectiveness of self-supervised pre-trained transformers compared to supervised pre-trained transformers and conventional neural networks (ConvNets) for detecting various types of deepfakes. We focus on their potential for improved generalization, particularly when training data is limited. Despite the notable success of large vision-language models utilizing transform… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  8. arXiv:2404.02677  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2024 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

    Abstract: The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Part… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 19 pages, https://www.voiceprivacychallenge.org/. arXiv admin note: substantial text overlap with arXiv:2203.12468

  9. arXiv:2403.17361  [pdf, other

    cs.CL cs.AI

    Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model

    Authors: Shirin Dabbaghi Varnosfaderani, Canasai Kruengkrai, Ramin Yahyapour, Junichi Yamagishi

    Abstract: FEVEROUS is a benchmark and research initiative focused on fact extraction and verification tasks involving unstructured text and structured tabular data. In FEVEROUS, existing works often rely on extensive preprocessing and utilize rule-based transformations of data, leading to potential context loss or misleading encodings. This paper introduces a simple yet powerful model that nullifies the nee… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted for a presentation at LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

  10. arXiv:2312.15616  [pdf, other

    cs.SD eess.AS stat.ML

    Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

    Authors: Aditya Ravuri, Erica Cooper, Junichi Yamagishi

    Abstract: Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses the gap in efficient audio quality prediction, especially in low-resource settings where extensive MOS data from large-scale listening tests may be unavailable. We demonstra… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, sasb draft

  11. arXiv:2312.14398  [pdf, other

    cs.SD eess.AS

    ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

    Authors: Cheng Gong, Xin Wang, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang, Korin Richmond, Junichi Yamagishi

    Abstract: Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. In most cases, TTS systems are built using a single speaker's voice. However, there is growing interest in develo** systems that can synthesize voices… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 5 figures

  12. arXiv:2312.06055  [pdf, other

    cs.SD eess.AS

    Speaker-Text Retrieval via Contrastive Learning

    Authors: Xuechen Liu, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

    Abstract: In this study, we introduce a novel cross-modal retrieval task involving speaker descriptions and their corresponding audio samples. Utilizing pre-trained speaker and text encoders, we present a simple learning framework based on contrastive learning. Additionally, we explore the impact of incorporating speaker labels into the training process. Our findings establish the effectiveness of linking s… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  13. arXiv:2310.16278  [pdf, other

    cs.CL cs.AI

    XFEVER: Exploring Fact Verification across Languages

    Authors: Yi-Chen Chang, Canasai Kruengkrai, Junichi Yamagishi

    Abstract: This paper introduces the Cross-lingual Fact Extraction and VERification (XFEVER) dataset designed for benchmarking the fact verification models across different languages. We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification (FEVER) dataset into six languages. The training and development sets were translated using machine translation, whereas the… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted for an oral presentation at the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

  14. arXiv:2310.06851  [pdf, other

    cs.CV cs.AI cs.GR

    BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

    Authors: Kunkun Pang, Dafei Qin, Yingruo Fan, Julian Habekost, Takaaki Shiratori, Junichi Yamagishi, Taku Komura

    Abstract: Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the map** between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framew… ▽ More

    Submitted 6 September, 2023; originally announced October 2023.

    Comments: 12 pages, 13 figures

  15. arXiv:2310.05078  [pdf, other

    eess.AS cs.SD

    Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

    Authors: Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah

    Abstract: This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather than the actual MOS values. That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss. Our exper… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  16. arXiv:2310.02640  [pdf, other

    eess.AS

    The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

    Authors: Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

    Abstract: We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seve… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  17. arXiv:2310.00922  [pdf, other

    cs.CV

    How Close are Other Computer Vision Tasks to Deepfake Detection?

    Authors: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

    Abstract: In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection. We present a new measurement, "model separability," for visually and quantitatively assessing a model's raw capacity to separate data in an unsupervised manner. We also present a systematic benchmark for deter… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to be Published in Proceedings of the IEEE International Joint Conference on Biometrics (IJCB 2023)

  18. arXiv:2309.09586  [pdf, ps, other

    cs.CR cs.SD eess.AS

    Spoofing attack augmentation: can differently-trained attack models improve generalisation?

    Authors: Wanying Ge, Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Nicholas Evans

    Abstract: A reliable deepfake detector or spoofing countermeasure (CM) should be robust in the face of unpredictable spoofing attacks. To encourage the learning of more generaliseable artefacts, rather than those specific only to known attacks, CMs are usually exposed to a broad variety of different attacks during training. Even so, the performance of deep-learning-based CM solutions are known to vary, some… ▽ More

    Submitted 8 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  19. arXiv:2309.07658  [pdf, other

    cs.SD eess.AS

    DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

    Authors: Nicolas Jonason, Xin Wang, Erica Cooper, Lauri Juvela, Bob L. T. Sturm, Junichi Yamagishi

    Abstract: We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  20. arXiv:2309.06141  [pdf, other

    cs.SD eess.AS

    SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

    Abstract: The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recogniti… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: conference

  21. arXiv:2309.06014  [pdf, other

    eess.AS cs.SD

    Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?

    Authors: Xin Wang, Junichi Yamagishi

    Abstract: A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data. Since many neural vocoders are fast in building and generation, this study used mult… ▽ More

    Submitted 27 December, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: To appear in ICASSP 2024. code on github: https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/10-asvspoof-vocoded-trn-ssl

  22. arXiv:2306.08850  [pdf, other

    cs.SD eess.AS

    Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music

    Authors: Lifan Zhong, Erica Cooper, Junichi Yamagishi, Nobuaki Minematsu

    Abstract: With the growing amount of musical data available, automatic instrument recognition, one of the essential problems in Music Information Retrieval (MIR), is drawing more and more attention. While automatic recognition of single instruments has been well-studied, it remains challenging for polyphonic, multi-instrument musical recordings. This work presents our efforts toward building a robust end-to… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Submitted to APSIPA 2023

  23. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-** Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  24. arXiv:2305.18823  [pdf, other

    cs.SD eess.AS

    Speaker anonymization using orthogonal Householder neural network

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker… ▽ More

    Submitted 12 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  25. arXiv:2305.17739  [pdf, other

    cs.SD cs.CL eess.AS

    Range-Based Equal Error Rate for Spoof Localization

    Authors: Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

    Abstract: Spoof localization, also called segment-level detection, is a crucial task that aims to locate spoofs in partially spoofed audio. The equal error rate (EER) is widely used to measure performance for such biometric scenarios. Although EER is the only threshold-free metric, it is usually calculated in a point-based way that uses scores and references with a pre-defined temporal resolution and counts… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  26. arXiv:2305.10940  [pdf, other

    eess.AS

    Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms

    Authors: Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi

    Abstract: The ability of countermeasure models to generalize from seen speech synthesis methods to unseen ones has been investigated in the ASVspoof challenge. However, a new mismatch scenario in which fake audio may be generated from real audio with unseen genres has not been studied thoroughly. To this end, we first use five different vocoders to create a new dataset called CN-Spoof based on the CN-Celeb1… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by interspeech2023

  27. arXiv:2305.10608  [pdf, other

    eess.AS

    Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech

    Authors: Erica Cooper, Junichi Yamagishi

    Abstract: Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech. However, the scores obtained in MOS tests are heavily dependent upon many contextual factors. One such factor is the overall range of quality of the samples presented in the test -- listeners tend to try to use the entire range of scoring options available to them regardless of this, a phenomenon which is known as ran… ▽ More

    Submitted 6 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Proceedings of Interspeech 2023. DOI: 10.21437/Interspeech.2023-1076

  28. arXiv:2304.04239  [pdf, other

    q-bio.CB physics.bio-ph

    Universal Transitions between Growth and Dormancy via Intermediate Complex Formation

    Authors: Jumpei F. Yamagishi, Kunihiko Kaneko

    Abstract: A simple cell model consisting of a catalytic reaction network with intermediate complex formation is numerically studied. As nutrients are depleted, the transition from the exponential growth phase to the growth-arrested dormant phase occurs along with hysteresis and a lag time for growth recovery. This transition is caused by the accumulation of intermediate complexes, leading to the jamming of… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: 6+6 pages, 3+6 figures

  29. arXiv:2303.02659  [pdf, other

    cs.CR cs.MM

    Cyber Vaccine for Deepfake Immunity

    Authors: Ching-Chun Chang, Huy Hong Nguyen, Junichi Yamagishi, Isao Echizen

    Abstract: Deepfakes pose an evolving threat to cybersecurity, which calls for the development of automated countermeasures. While considerable forensic research has been devoted to the detection and localisation of deepfakes, solutions for reversing fake to real are yet to be developed. In this study, we introduce cyber vaccination for conferring immunity to deepfakes. Analogous to biological vaccination th… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  30. arXiv:2211.16065  [pdf, other

    eess.AS cs.SD

    Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

    Authors: Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf

    Abstract: The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes. Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis approach, which is co… ▽ More

    Submitted 24 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  31. arXiv:2211.13868  [pdf, other

    cs.SD eess.AS

    Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

    Authors: Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan

    Abstract: With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and trainin… ▽ More

    Submitted 20 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

  32. arXiv:2210.15183  [pdf, other

    cs.CL cs.CY cs.LG

    Outlier-Aware Training for Improving Group Accuracy Disparities

    Authors: Li-Kuang Chen, Canasai Kruengkrai, Junichi Yamagishi

    Abstract: Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107.09044v2) involve reweighting a subset of the training set to maximize the worst-group accuracy. However, the reweighted set of examples may potentially contain unlearnable examples that hamper the model's learning. We propose mitigating this by detecting outliers to the training set and removing them before reweight… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  33. arXiv:2210.14508  [pdf, other

    q-bio.MN physics.bio-ph q-bio.QM

    Linear Response Theory of Evolved Metabolic Systems

    Authors: Jumpei F. Yamagishi, Tetsuhiro S. Hatakeyama

    Abstract: Predicting cellular metabolic states is a central problem in biophysics. Conventional approaches, however, sensitively depend on the microscopic details of individual metabolic systems. In this Letter, we derived a universal linear relationship between the metabolic responses against nutrient conditions and metabolic inhibition, with the aid of a microeconomic theory. The relationship holds in arb… ▽ More

    Submitted 11 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 6+6 pages, 3+4 figures, 1 table

  34. arXiv:2210.10667  [pdf, other

    cs.CV

    Analysis of Master Vein Attacks on Finger Vein Recognition Systems

    Authors: Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen

    Abstract: Finger vein recognition (FVR) systems have been commercially used, especially in ATMs, for customer verification. Thus, it is essential to measure their robustness against various attack methods, especially when a hand-crafted FVR system is used without any countermeasure methods. In this paper, we are the first in the literature to introduce master vein attacks in which we craft a vein-looking im… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted to be Published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  35. arXiv:2210.10570  [pdf, other

    eess.AS cs.SD

    Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders

    Authors: Xin Wang, Junichi Yamagishi

    Abstract: A good training set for speech spoofing countermeasures requires diverse TTS and VC spoofing attacks, but generating TTS and VC spoofed trials for a target speaker may be technically demanding. Instead of using full-fledged TTS and VC systems, this study uses neural-network-based vocoders to do copy-synthesis on bona fide utterances. The output data can be used as spoofed data. To make better use… ▽ More

    Submitted 22 February, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: ICASSP 2023 accepted. Code: https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/09-asvspoof-vocoded-trn

  36. arXiv:2210.02437  [pdf, other

    cs.SD cs.CR cs.MM eess.AS

    ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

    Authors: Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee

    Abstract: Benchmarking initiatives support the meaningful comparison of competing solutions to prominent problems in speech and language processing. Successive benchmarking evaluations typically reflect a progressive evolution from ideal lab conditions towards to those encountered in the wild. ASVspoof, the spoofing and deepfake detection initiative and challenge series, has followed the same trend. This ar… ▽ More

    Submitted 22 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  37. arXiv:2209.00485  [pdf, other

    eess.AS cs.SD

    Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

    Authors: Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi

    Abstract: Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and ba… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Submitted to TASLP

  38. Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022

    Authors: Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi

    Abstract: Current state-of-the-art automatic speaker verification (ASV) systems are vulnerable to presentation attacks, and several countermeasures (CMs), which distinguish bona fide trials from spoofing ones, have been explored to protect ASV. However, ASV systems and CMs are generally developed and optimized independently without considering their inter-relationship. In this paper, we propose a new spoofi… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted by InterSpeech2022

  39. arXiv:2207.04640  [pdf, other

    q-bio.PE cond-mat.stat-mech physics.bio-ph

    A geometric speed limit for acceleration by natural selection in evolutionary processes

    Authors: Masahiro Hoshino, Ryuna Nagayama, Kohei Yoshimura, Jumpei F. Yamagishi, Sosuke Ito

    Abstract: We derived a new speed limit in population dynamics, which is a fundamental limit on the evolutionary rate. By splitting the contributions of selection and mutation to the evolutionary rate, we obtained the new bound on the speed of arbitrary observables, named the selection bound, that can be tighter than the conventional Cramér--Rao bound. Remarkably, the selection bound can be much tighter if t… ▽ More

    Submitted 12 January, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 11 pages, 7 figures

  40. arXiv:2205.07123  [pdf, other

    cs.CL cs.CR eess.AS

    The VoicePrivacy 2020 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

    Abstract: The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used f… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.12468

  41. arXiv:2204.05177  [pdf, other

    eess.AS cs.CR cs.SD

    The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

    Authors: Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

    Abstract: Automatic speaker verification is susceptible to various manipulations and spoofing, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, and so on. We consider a new spoofing scenario called "Partial Spoof" (PS) in which synthesized or transformed speech segments are embedded into a bona fide utterance. While existing countermeasures (CMs) can detect fully s… ▽ More

    Submitted 30 January, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (DOI: 10.1109/TASLP.2022.3233236)

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 813-825, 2023

  42. arXiv:2203.14834  [pdf, other

    cs.SD

    Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study ana… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Submit to Interspeech2022

  43. arXiv:2203.14553  [pdf, other

    eess.AS

    Investigating Active-learning-based Training Data Selection for Speech Spoofing Countermeasure

    Authors: Xin Wang, Junich Yamagishi

    Abstract: Training a spoofing countermeasure (CM) that generalizes to various unseen data is desired but challenging. While methods such as data augmentation and self-supervised learning are applicable, the imperfect CM performance on diverse test sets still calls for additional strategies. This study took the initiative and investigated CM training using active learning (AL), a framework that iteratively s… ▽ More

    Submitted 7 October, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: To appear in Proc. SLT 2022, modified based on a paper rejected by Interspeech 2022

  44. arXiv:2203.12468  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2022 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

    Abstract: For new participants - Executive summary: (1) The task is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content, paralinguistic attributes, intelligibility and naturalness. (2) Training, development and evaluation datasets are provided in addition to 3 different baseline anonymization systems, evaluation scripts, and… ▽ More

    Submitted 28 September, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: the file is unchanged; minor correction in metadata

  45. arXiv:2203.11500  [pdf, other

    eess.AS cs.SD

    Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement

    Authors: Haoyu Li, Yun Liu, Junichi Yamagishi

    Abstract: Speech enhancement (SE) methods mainly focus on recovering clean speech from noisy input. In real-world speech communication, however, noises often exist in not only speaker but also listener environments. Although SE methods can suppress the noise contained in the speaker's voice, they cannot deal with the noise that is physically present in the listener side. To address such a complicated but co… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  46. arXiv:2203.11389  [pdf, other

    cs.SD eess.AS

    The VoiceMOS Challenge 2022

    Authors: Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

    Abstract: We present the first edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. This challenge drew 22 participating teams from academia and industry who tried a variety of approaches to tackle the problem of predicting human ratings of synthesized speech. The listening test data for the main tra… ▽ More

    Submitted 3 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2022

  47. arXiv:2202.13097  [pdf, ps, other

    cs.SD eess.AS

    Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is la… ▽ More

    Submitted 27 April, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

  48. arXiv:2202.12233  [pdf, other

    eess.AS cs.SD

    Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

    Authors: Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans

    Abstract: The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve reliability in the face of uncontrolled, unpredictable attacks are hence needed. We report in this paper our efforts to use self-… ▽ More

    Submitted 28 February, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: Submitted to Speaker Odyssey Workshop 2022

  49. arXiv:2202.06228  [pdf, other

    cs.CV

    Robust Deepfake On Unrestricted Media: Generation And Detection

    Authors: Trung-Nghia Le, Huy H Nguyen, Junichi Yamagishi, Isao Echizen

    Abstract: Recent advances in deep learning have led to substantial improvements in deepfake generation, resulting in fake media with a more realistic appearance. Although deepfake media have potential application in a wide range of areas and are drawing much attention from both the academic and industrial communities, it also leads to serious social and criminal concerns. This chapter explores the evolution… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

    Comments: This article will appear as one chapter for a new book called Frontiers in Fake Media Generation and Detection, edited by Mahdi Khosravy, Isao Echizen, and Noboru Babaguchi

  50. arXiv:2201.09709  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

    Authors: Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

    Abstract: As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Published version available at: https://ieeexplore.ieee.org/document/9664367

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022