Search | arXiv e-print repository

Speaker anonymization using neural audio codec language models

Authors: Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas Evans

Abstract: The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker… ▽ More The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022. △ Less

Submitted 12 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted at ICASSP 2024

arXiv:2306.16071 [pdf, other]

Long-term Conversation Analysis: Exploring Utility and Privacy

Authors: Francesco Nespoli, Jule Pohlhausen, Patrick A. Naylor, Joerg Bitzer

Abstract: The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a… ▽ More The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Submitted to ITG Conference on Speech Communication, 2023

arXiv:2306.16069 [pdf, other]

Two-Stage Voice Anonymization for Enhanced Privacy

Authors: Francesco Nespoli, Daniel Barreda, Joerg Bitzer, Patrick A. Naylor

Abstract: In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose and evaluate a two-stage anonymization pipeline exploiting a state-of-the-art anonymization model described in the Voice Privacy Challenge 2022 in combination wit… ▽ More In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose and evaluate a two-stage anonymization pipeline exploiting a state-of-the-art anonymization model described in the Voice Privacy Challenge 2022 in combination with a zero-shot voice conversion architecture able to capture speaker characteristics from a few seconds of speech. We show this architecture can lead to strong privacy preservation while preserving pitch information. Finally, we propose a new compressed metric to evaluate anonymization systems in privacy scenarios with different constraints on privacy and utility. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: submitted to INTERSPEECH

arXiv:2212.01306 [pdf, other]

Relative Acoustic Features for Distance Estimation in Smart-Homes

Authors: Francesco Nespoli, Daniel Barreda, Patrick A. Naylor

Abstract: Any audio recording encapsulates the unique fingerprint of the associated acoustic environment, namely the background noise and reverberation. Considering the scenario of a room equipped with a fixed smart speaker device with one or more microphones and a wearable smart device (watch, glasses or smartphone), we employed the improved proportionate normalized least mean square adaptive filter to est… ▽ More Any audio recording encapsulates the unique fingerprint of the associated acoustic environment, namely the background noise and reverberation. Considering the scenario of a room equipped with a fixed smart speaker device with one or more microphones and a wearable smart device (watch, glasses or smartphone), we employed the improved proportionate normalized least mean square adaptive filter to estimate the relative room impulse response map** the audio recordings of the two devices. We performed inter-device distance estimation by exploiting a new set of features obtained extending the definition of some acoustic attributes of the room impulse response to its relative version. In combination with the sparseness measure of the estimated relative room impulse response, the relative features allow precise inter-device distance estimation which can be exploited for tasks such as best microphone selection or acoustic scene analysis. Experimental results from simulated rooms of different dimensions and reverberation times demonstrate the effectiveness of this computationally lightweight approach for smart home acoustic ranging applications △ Less

Submitted 2 December, 2022; originally announced December 2022.

Journal ref: Interspeech 2022

Showing 1–4 of 4 results for author: Nespoli, F