-
Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder
Authors:
Yuying Xie,
Thomas Arildsen,
Zheng-Hua Tan
Abstract:
Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribut…
▽ More
Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion.
△ Less
Submitted 14 June, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
Authors:
Yuying Xie,
Thomas Arildsen,
Zheng-Hua Tan
Abstract:
As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent ne…
▽ More
As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent neural network in the complex VAE framework. Besides, L1 loss is used as the reconstruction loss in this framework. To exemplify the use of the complex generative model in speech processing, we choose speech enhancement as the specific application in this paper. Experiments are based on the TIMIT dataset. The results show that the proposed method offers improvements on objective metrics in speech intelligibility and signal quality.
△ Less
Submitted 12 May, 2023; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective
Authors:
Yuying Xie,
Thomas Arildsen,
Zheng-Hua Tan
Abstract:
Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level and segmental-level features, which represent speaker identity and speech content information, respectively. As a self-supervised objective, autoregressive pre…
▽ More
Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level and segmental-level features, which represent speaker identity and speech content information, respectively. As a self-supervised objective, autoregressive predictive coding (APC), on the other hand, has been used in extracting meaningful and transferable speech features for multiple downstream tasks. Inspired by the success of these two representation learning methods, this paper proposes to integrate the APC objective into the FHVAE framework aiming at benefiting from the additional self-supervision target. The main proposed method requires neither more training data nor more computational cost at test time, but obtains improved meaningful representations while maintaining disentanglement. The experiments were conducted on the TIMIT dataset. Results demonstrate that FHVAE equipped with the additional self-supervised objective is able to learn features providing superior performance for tasks including speech recognition and speaker recognition. Furthermore, voice conversion, as one application of disentangled representation learning, has been applied and evaluated. The results show performance similar to baseline of the new framework on voice conversion.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Generalised Approximate Message Passing for Non-I.I.D. Sparse Signals
Authors:
Christian Schou Oxvig,
Thomas Arildsen
Abstract:
Generalised approximate message passing (GAMP) is an approximate Bayesian estimation algorithm for signals observed through a linear transform with a possibly non-linear subsequent measurement model. By leveraging prior information about the observed signal, such as sparsity in a known dictionary, GAMP can for example reconstruct signals from under-determined measurements - known as compressed sen…
▽ More
Generalised approximate message passing (GAMP) is an approximate Bayesian estimation algorithm for signals observed through a linear transform with a possibly non-linear subsequent measurement model. By leveraging prior information about the observed signal, such as sparsity in a known dictionary, GAMP can for example reconstruct signals from under-determined measurements - known as compressed sensing. In the sparse signal setting, most existing signal priors for GAMP assume the input signal to have i.i.d. entries. Here we present sparse signal priors for GAMP to estimate non-i.d.d. signals through a non-uniform weighting of the input prior, for example allowing GAMP to support model-based compressed sensing.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
Sustainable computational science: the ReScience initiative
Authors:
Nicolas P. Rougier,
Konrad Hinsen,
Frédéric Alexandre,
Thomas Arildsen,
Lorena Barba,
Fabien C. Y. Benureau,
C. Titus Brown,
Pierre de Buyl,
Ozan Caglayan,
Andrew P. Davison,
Marc André Delsuc,
Georgios Detorakis,
Alexandra K. Diem,
Damien Drix,
Pierre Enel,
Benoît Girard,
Olivia Guest,
Matt G. Hall,
Rafael Neto Henriques,
Xavier Hinaut,
Kamil S Jaron,
Mehdi Khamassi,
Almar Klein,
Tiina Manninen,
Pietro Marchesi
, et al. (20 additional authors not shown)
Abstract:
Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than tw…
▽ More
Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.
△ Less
Submitted 11 November, 2017; v1 submitted 14 July, 2017;
originally announced July 2017.
-
Proceedings of the third "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'16)
Authors:
V. Abrol,
O. Absil,
P. -A. Absil,
S. Anthoine,
P. Antoine,
T. Arildsen,
N. Bertin,
F. Bleichrodt,
J. Bobin,
A. Bol,
A. Bonnefoy,
F. Caltagirone,
V. Cambareri,
C. Chenot,
V. Crnojević,
M. Daňková,
K. Degraux,
J. Eisert,
J. M. Fadili,
M. Gabrié,
N. Gac,
D. Giacobello,
A. Gonzalez,
C. A. Gomez Gonzalez,
A. González
, et al. (36 additional authors not shown)
Abstract:
The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collab…
▽ More
The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For this third edition, iTWIST'16 gathered about 50 international participants and features 8 invited talks, 12 oral presentations, and 12 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing (e.g., optics, computer vision, genomics, biomedical, digital communication, channel estimation, astronomy); Application of sparse models in non-convex/non-linear inverse problems (e.g., phase retrieval, blind deconvolution, self calibration); Approximate probabilistic inference for sparse problems; Sparse machine learning and inference; "Blind" inverse problems and dictionary learning; Optimization for sparse modelling; Information theory, geometry and randomness; Sparsity? What's next? (Discrete-valued signals; Union of low-dimensional spaces, Cosparsity, mixed/group norm, model-based, low-complexity models, ...); Matrix/manifold sensing/processing (graph, low-rank approximation, ...); Complexity/accuracy tradeoffs in numerical methods/optimization; Electronic/optical compressive sensors (hardware).
△ Less
Submitted 14 September, 2016;
originally announced September 2016.
-
Frequency Selective Compressed Sensing
Authors:
Jacek Pierzchlewski,
Thomas Arildsen
Abstract:
In this paper the authors describe the problem of acquisition of interfered signals and formulate a filtering problem. A frequency-selective compressed sensing technique is proposed as a solution to this problem. Signal acquisition is critical in facilitating frequency-selective compressed sensing. The authors propose a filtering compressed sensing parameter, which allows to assess if a given acqu…
▽ More
In this paper the authors describe the problem of acquisition of interfered signals and formulate a filtering problem. A frequency-selective compressed sensing technique is proposed as a solution to this problem. Signal acquisition is critical in facilitating frequency-selective compressed sensing. The authors propose a filtering compressed sensing parameter, which allows to assess if a given acquisition process makes frequency-selective compressed sensing possible for a given filtering problem. A numerical experiment which shows how the described method works in practice is conducted.
△ Less
Submitted 16 January, 2015; v1 submitted 8 January, 2015;
originally announced January 2015.
-
Generation and Analysis of Constrained Random Sampling Patterns
Authors:
Jacek Pierzchlewski,
Thomas Arildsen
Abstract:
Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC…
▽ More
Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC characteristics and application requirements. In this paper authors introduce statistical methods which evaluate random sampling pattern generators with emphasis on practical applications. Furthermore, the authors propose a new random pattern generator which copes with strict practical limitations imposed on patterns, with possibly minimal loss in randomness of sampling. The proposed generator is compared with existing sampling pattern generators using the introduced statistical methods. It is shown that the proposed algorithm generates random sampling patterns dedicated for event-driven-ADCs better than existed sampling pattern generators. Finally, implementation issues of random sampling patterns are discussed.
△ Less
Submitted 7 October, 2015; v1 submitted 3 September, 2014;
originally announced September 2014.
-
Compressed Sensing Based Direct Conversion Receiver With Interference Reducing Sampling
Authors:
Jacek Pierzchlewski,
Thomas Arildsen,
Torben Larsen
Abstract:
This paper describes a direct conversion receiver applying compressed sensing with the objective to relax the analog filtering requirements seen in the traditional architecture. The analog filter is cumbersome in an \gls{IC} design and relaxing its requirements is an advantage in terms of die area, performance and robustness of the receiver. The objective is met by a selection of sampling pattern…
▽ More
This paper describes a direct conversion receiver applying compressed sensing with the objective to relax the analog filtering requirements seen in the traditional architecture. The analog filter is cumbersome in an \gls{IC} design and relaxing its requirements is an advantage in terms of die area, performance and robustness of the receiver. The objective is met by a selection of sampling pattern matched to the prior knowledge of the frequency placement of the desired and interfering signals. A simple numerical example demonstrates the principle. The work is part of an ongoing research effort and the different project phases are explained.
△ Less
Submitted 23 April, 2014;
originally announced April 2014.
-
Model-Based Calibration of Filter Imperfections in the Random Demodulator for Compressive Sensing
Authors:
Pawel Jerzy Pankiewicz,
Thomas Arildsen,
Torben Larsen
Abstract:
The random demodulator is a recent compressive sensing architecture providing efficient sub-Nyquist sampling of sparse band-limited signals. The compressive sensing paradigm requires an accurate model of the analog front-end to enable correct signal reconstruction in the digital domain. In practice, hardware devices such as filters deviate from their desired design behavior due to component variat…
▽ More
The random demodulator is a recent compressive sensing architecture providing efficient sub-Nyquist sampling of sparse band-limited signals. The compressive sensing paradigm requires an accurate model of the analog front-end to enable correct signal reconstruction in the digital domain. In practice, hardware devices such as filters deviate from their desired design behavior due to component variations. Existing reconstruction algorithms are sensitive to such deviations, which fall into the more general category of measurement matrix perturbations. This paper proposes a model-based technique that aims to calibrate filter model mismatches to facilitate improved signal reconstruction quality. The mismatch is considered to be an additive error in the discretized impulse response. We identify the error by sampling a known calibrating signal, enabling least-squares estimation of the impulse response error. The error estimate and the known system model are used to calibrate the measurement matrix. Numerical analysis demonstrates the effectiveness of the calibration method even for highly deviating low-pass filter responses. The proposed method performance is also compared to a state of the art method based on discrete Fourier transform trigonometric interpolation.
△ Less
Submitted 25 March, 2013;
originally announced March 2013.
-
Compressed Sensing with Linear Correlation Between Signal and Measurement Noise
Authors:
Thomas Arildsen,
Torben Larsen
Abstract:
Existing convex relaxation-based approaches to reconstruction in compressed sensing assume that noise in the measurements is independent of the signal of interest. We consider the case of noise being linearly correlated with the signal and introduce a simple technique for improving compressed sensing reconstruction from such measurements. The technique is based on a linear model of the correlation…
▽ More
Existing convex relaxation-based approaches to reconstruction in compressed sensing assume that noise in the measurements is independent of the signal of interest. We consider the case of noise being linearly correlated with the signal and introduce a simple technique for improving compressed sensing reconstruction from such measurements. The technique is based on a linear model of the correlation of additive noise with the signal. The modification of the reconstruction algorithm based on this model is very simple and has negligible additional computational cost compared to standard reconstruction algorithms, but is not known in existing literature. The proposed technique reduces reconstruction error considerably in the case of linearly correlated measurements and noise. Numerical experiments confirm the efficacy of the technique. The technique is demonstrated with application to low-rate quantization of compressed measurements, which is known to introduce correlated noise, and improvements in reconstruction error compared to ordinary Basis Pursuit De-Noising of up to approximately 7 dB are observed for 1 bit/sample quantization. Furthermore, the proposed method is compared to Binary Iterative Hard Thresholding which it is demonstrated to outperform in terms of reconstruction error for sparse signals with a number of non-zero coefficients greater than approximately 1/10th of the number of compressed measurements.
△ Less
Submitted 7 November, 2013; v1 submitted 2 January, 2013;
originally announced January 2013.
-
Improving Smoothed l0 Norm in Compressive Sensing Using Adaptive Parameter Selection
Authors:
Christian Schou Oxvig,
Patrick Steffen Pedersen,
Thomas Arildsen,
Torben Larsen
Abstract:
Signal reconstruction in compressive sensing involves finding a sparse solution that satisfies a set of linear constraints. Several approaches to this problem have been considered in existing reconstruction algorithms. They each provide a trade-off between reconstruction capabilities and required computation time. In an attempt to push the limits for this trade-off, we consider a smoothed l0 norm…
▽ More
Signal reconstruction in compressive sensing involves finding a sparse solution that satisfies a set of linear constraints. Several approaches to this problem have been considered in existing reconstruction algorithms. They each provide a trade-off between reconstruction capabilities and required computation time. In an attempt to push the limits for this trade-off, we consider a smoothed l0 norm (SL0) algorithm in a noiseless setup. We argue that using a set of carefully chosen parameters in our proposed adaptive SL0 algorithm may result in significantly better reconstruction capabilities in terms of phase transition while retaining the same required computation time as existing SL0 algorithms. A large set of simulations further support this claim. Simulations even reveal that the theoretical l1 curve may be surpassed in major parts of the phase space.
△ Less
Submitted 14 March, 2013; v1 submitted 16 October, 2012;
originally announced October 2012.
-
Demodulating Subsampled Direct Sequence Spread Spectrum Signals using Compressive Signal Processing
Authors:
Karsten Fyhn,
Thomas Arildsen,
Torben Larsen,
Søren Holdt Jensen
Abstract:
We show that to lower the sampling rate in a spread spectrum communication system using Direct Sequence Spread Spectrum (DSSS), compressive signal processing can be applied to demodulate the received signal. This may lead to a decrease in the power consumption or the manufacturing price of wireless receivers using spread spectrum technology. The main novelty of this paper is the discovery that in…
▽ More
We show that to lower the sampling rate in a spread spectrum communication system using Direct Sequence Spread Spectrum (DSSS), compressive signal processing can be applied to demodulate the received signal. This may lead to a decrease in the power consumption or the manufacturing price of wireless receivers using spread spectrum technology. The main novelty of this paper is the discovery that in spread spectrum systems it is possible to apply compressive sensing with a much simpler hardware architecture than in other systems, making the implementation both simpler and more energy efficient. Our theoretical work is exemplified with a numerical experiment using the IEEE 802.15.4 standard's 2.4 GHz band specification. The numerical results support our theoretical findings and indicate that compressive sensing may be used successfully in spread spectrum communication systems. The results obtained here may also be applicable in other spread spectrum technologies, such as Code Division Multiple Access (CDMA) systems.
△ Less
Submitted 10 October, 2012; v1 submitted 24 October, 2011;
originally announced October 2011.