Search | arXiv e-print repository

arXiv:2211.08191 [pdf, other]

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

Authors: Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Abstract: Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribut… ▽ More Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion. △ Less

Submitted 14 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: accepted by EUSIPCO 2023

arXiv:2204.02195 [pdf, other]

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement

Authors: Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Abstract: As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent ne… ▽ More As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent neural network in the complex VAE framework. Besides, L1 loss is used as the reconstruction loss in this framework. To exemplify the use of the complex generative model in speech processing, we choose speech enhancement as the specific application in this paper. Experiments are based on the TIMIT dataset. The results show that the proposed method offers improvements on objective metrics in speech intelligibility and signal quality. △ Less

Submitted 12 May, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2204.02166 [pdf, other]

doi 10.1109/MLSP52302.2021.9596320

Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective

Authors: Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Abstract: Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level and segmental-level features, which represent speaker identity and speech content information, respectively. As a self-supervised objective, autoregressive pre… ▽ More Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level and segmental-level features, which represent speaker identity and speech content information, respectively. As a self-supervised objective, autoregressive predictive coding (APC), on the other hand, has been used in extracting meaningful and transferable speech features for multiple downstream tasks. Inspired by the success of these two representation learning methods, this paper proposes to integrate the APC objective into the FHVAE framework aiming at benefiting from the additional self-supervision target. The main proposed method requires neither more training data nor more computational cost at test time, but obtains improved meaningful representations while maintaining disentanglement. The experiments were conducted on the TIMIT dataset. Results demonstrate that FHVAE equipped with the additional self-supervised objective is able to learn features providing superior performance for tasks including speech recognition and speaker recognition. Furthermore, voice conversion, as one application of disentangled representation learning, has been applied and evaluated. The results show performance similar to baseline of the new framework on voice conversion. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: Published in: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:1812.00909 [pdf, other]

Generalised Approximate Message Passing for Non-I.I.D. Sparse Signals

Authors: Christian Schou Oxvig, Thomas Arildsen

Abstract: Generalised approximate message passing (GAMP) is an approximate Bayesian estimation algorithm for signals observed through a linear transform with a possibly non-linear subsequent measurement model. By leveraging prior information about the observed signal, such as sparsity in a known dictionary, GAMP can for example reconstruct signals from under-determined measurements - known as compressed sen… ▽ More Generalised approximate message passing (GAMP) is an approximate Bayesian estimation algorithm for signals observed through a linear transform with a possibly non-linear subsequent measurement model. By leveraging prior information about the observed signal, such as sparsity in a known dictionary, GAMP can for example reconstruct signals from under-determined measurements - known as compressed sensing. In the sparse signal setting, most existing signal priors for GAMP assume the input signal to have i.i.d. entries. Here we present sparse signal priors for GAMP to estimate non-i.d.d. signals through a non-uniform weighting of the input prior, for example allowing GAMP to support model-based compressed sensing. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Comments: 3 pages, 1 figure, presented at iTWIST 2018, Marseille

Journal ref: in Proceedings of iTWIST'18, Paper-ID: 24, Marseille, France, November, 21-23, 2018

arXiv:1707.04393 [pdf, other]

doi 10.7717/peerj-cs.142

Sustainable computational science: the ReScience initiative

Authors: Nicolas P. Rougier, Konrad Hinsen, Frédéric Alexandre, Thomas Arildsen, Lorena Barba, Fabien C. Y. Benureau, C. Titus Brown, Pierre de Buyl, Ozan Caglayan, Andrew P. Davison, Marc André Delsuc, Georgios Detorakis, Alexandra K. Diem, Damien Drix, Pierre Enel, Benoît Girard, Olivia Guest, Matt G. Hall, Rafael Neto Henriques, Xavier Hinaut, Kamil S Jaron, Mehdi Khamassi, Almar Klein, Tiina Manninen, Pietro Marchesi , et al. (20 additional authors not shown)

Abstract: Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than tw… ▽ More Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests. △ Less

Submitted 11 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

Comments: 8 pages, 1 figure

Journal ref: PeerJ Computer Science 3:e142 (2017)

arXiv:1609.04167 [pdf, other]

Proceedings of the third "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'16)

Authors: V. Abrol, O. Absil, P. -A. Absil, S. Anthoine, P. Antoine, T. Arildsen, N. Bertin, F. Bleichrodt, J. Bobin, A. Bol, A. Bonnefoy, F. Caltagirone, V. Cambareri, C. Chenot, V. Crnojević, M. Daňková, K. Degraux, J. Eisert, J. M. Fadili, M. Gabrié, N. Gac, D. Giacobello, A. Gonzalez, C. A. Gomez Gonzalez, A. González , et al. (36 additional authors not shown)

Abstract: The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collab… ▽ More The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For this third edition, iTWIST'16 gathered about 50 international participants and features 8 invited talks, 12 oral presentations, and 12 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing (e.g., optics, computer vision, genomics, biomedical, digital communication, channel estimation, astronomy); Application of sparse models in non-convex/non-linear inverse problems (e.g., phase retrieval, blind deconvolution, self calibration); Approximate probabilistic inference for sparse problems; Sparse machine learning and inference; "Blind" inverse problems and dictionary learning; Optimization for sparse modelling; Information theory, geometry and randomness; Sparsity? What's next? (Discrete-valued signals; Union of low-dimensional spaces, Cosparsity, mixed/group norm, model-based, low-complexity models, ...); Matrix/manifold sensing/processing (graph, low-rank approximation, ...); Complexity/accuracy tradeoffs in numerical methods/optimization; Electronic/optical compressive sensors (hardware). △ Less

Submitted 14 September, 2016; originally announced September 2016.

Comments: 69 pages, 22 extended abstracts, iTWIST'16 website: http://www.itwist16.es.aau.dk

arXiv:1501.01792 [pdf, other]

Frequency Selective Compressed Sensing

Authors: Jacek Pierzchlewski, Thomas Arildsen

Abstract: In this paper the authors describe the problem of acquisition of interfered signals and formulate a filtering problem. A frequency-selective compressed sensing technique is proposed as a solution to this problem. Signal acquisition is critical in facilitating frequency-selective compressed sensing. The authors propose a filtering compressed sensing parameter, which allows to assess if a given acqu… ▽ More In this paper the authors describe the problem of acquisition of interfered signals and formulate a filtering problem. A frequency-selective compressed sensing technique is proposed as a solution to this problem. Signal acquisition is critical in facilitating frequency-selective compressed sensing. The authors propose a filtering compressed sensing parameter, which allows to assess if a given acquisition process makes frequency-selective compressed sensing possible for a given filtering problem. A numerical experiment which shows how the described method works in practice is conducted. △ Less

Submitted 16 January, 2015; v1 submitted 8 January, 2015; originally announced January 2015.

Comments: ieee notice page + 4 pages + references page, 7 figures, submitted to IEEE Signal Processing Letters

arXiv:1409.1002 [pdf, ps, other]

Generation and Analysis of Constrained Random Sampling Patterns

Authors: Jacek Pierzchlewski, Thomas Arildsen

Abstract: Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC… ▽ More Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC characteristics and application requirements. In this paper authors introduce statistical methods which evaluate random sampling pattern generators with emphasis on practical applications. Furthermore, the authors propose a new random pattern generator which copes with strict practical limitations imposed on patterns, with possibly minimal loss in randomness of sampling. The proposed generator is compared with existing sampling pattern generators using the introduced statistical methods. It is shown that the proposed algorithm generates random sampling patterns dedicated for event-driven-ADCs better than existed sampling pattern generators. Finally, implementation issues of random sampling patterns are discussed. △ Less

Submitted 7 October, 2015; v1 submitted 3 September, 2014; originally announced September 2014.

Comments: 29 pages, 12 figures, submitted to Circuits, Systems and Signal Processing journal

arXiv:1404.6150 [pdf, other]

doi 10.1109/SAHCN.2014.6990352

Compressed Sensing Based Direct Conversion Receiver With Interference Reducing Sampling

Authors: Jacek Pierzchlewski, Thomas Arildsen, Torben Larsen

Abstract: This paper describes a direct conversion receiver applying compressed sensing with the objective to relax the analog filtering requirements seen in the traditional architecture. The analog filter is cumbersome in an \gls{IC} design and relaxing its requirements is an advantage in terms of die area, performance and robustness of the receiver. The objective is met by a selection of sampling pattern… ▽ More This paper describes a direct conversion receiver applying compressed sensing with the objective to relax the analog filtering requirements seen in the traditional architecture. The analog filter is cumbersome in an \gls{IC} design and relaxing its requirements is an advantage in terms of die area, performance and robustness of the receiver. The objective is met by a selection of sampling pattern matched to the prior knowledge of the frequency placement of the desired and interfering signals. A simple numerical example demonstrates the principle. The work is part of an ongoing research effort and the different project phases are explained. △ Less

Submitted 23 April, 2014; originally announced April 2014.

Comments: 3 pages, 5 figures, submitted to IEEE International Conference On Sensing Communication and Networking 2014 (poster)

arXiv:1303.6135 [pdf, other]

Model-Based Calibration of Filter Imperfections in the Random Demodulator for Compressive Sensing

Authors: Pawel Jerzy Pankiewicz, Thomas Arildsen, Torben Larsen

Abstract: The random demodulator is a recent compressive sensing architecture providing efficient sub-Nyquist sampling of sparse band-limited signals. The compressive sensing paradigm requires an accurate model of the analog front-end to enable correct signal reconstruction in the digital domain. In practice, hardware devices such as filters deviate from their desired design behavior due to component variat… ▽ More The random demodulator is a recent compressive sensing architecture providing efficient sub-Nyquist sampling of sparse band-limited signals. The compressive sensing paradigm requires an accurate model of the analog front-end to enable correct signal reconstruction in the digital domain. In practice, hardware devices such as filters deviate from their desired design behavior due to component variations. Existing reconstruction algorithms are sensitive to such deviations, which fall into the more general category of measurement matrix perturbations. This paper proposes a model-based technique that aims to calibrate filter model mismatches to facilitate improved signal reconstruction quality. The mismatch is considered to be an additive error in the discretized impulse response. We identify the error by sampling a known calibrating signal, enabling least-squares estimation of the impulse response error. The error estimate and the known system model are used to calibrate the measurement matrix. Numerical analysis demonstrates the effectiveness of the calibration method even for highly deviating low-pass filter responses. The proposed method performance is also compared to a state of the art method based on discrete Fourier transform trigonometric interpolation. △ Less

Submitted 25 March, 2013; originally announced March 2013.

Comments: 10 pages, 8 figures, submitted to IEEE Transactions on Signal Processing

arXiv:1301.0213 [pdf, other]

doi 10.1016/j.sigpro.2013.10.021

Compressed Sensing with Linear Correlation Between Signal and Measurement Noise

Authors: Thomas Arildsen, Torben Larsen

Abstract: Existing convex relaxation-based approaches to reconstruction in compressed sensing assume that noise in the measurements is independent of the signal of interest. We consider the case of noise being linearly correlated with the signal and introduce a simple technique for improving compressed sensing reconstruction from such measurements. The technique is based on a linear model of the correlation… ▽ More Existing convex relaxation-based approaches to reconstruction in compressed sensing assume that noise in the measurements is independent of the signal of interest. We consider the case of noise being linearly correlated with the signal and introduce a simple technique for improving compressed sensing reconstruction from such measurements. The technique is based on a linear model of the correlation of additive noise with the signal. The modification of the reconstruction algorithm based on this model is very simple and has negligible additional computational cost compared to standard reconstruction algorithms, but is not known in existing literature. The proposed technique reduces reconstruction error considerably in the case of linearly correlated measurements and noise. Numerical experiments confirm the efficacy of the technique. The technique is demonstrated with application to low-rate quantization of compressed measurements, which is known to introduce correlated noise, and improvements in reconstruction error compared to ordinary Basis Pursuit De-Noising of up to approximately 7 dB are observed for 1 bit/sample quantization. Furthermore, the proposed method is compared to Binary Iterative Hard Thresholding which it is demonstrated to outperform in terms of reconstruction error for sparse signals with a number of non-zero coefficients greater than approximately 1/10th of the number of compressed measurements. △ Less

Submitted 7 November, 2013; v1 submitted 2 January, 2013; originally announced January 2013.

Comments: 37 pages, 5 figures. Accepted for publication in EURASIP Signal Processing Accompanying Matlab code available at: https://github.com/ThomasA/cs-correlated-noise

arXiv:1210.4277 [pdf, other]

Improving Smoothed l0 Norm in Compressive Sensing Using Adaptive Parameter Selection

Authors: Christian Schou Oxvig, Patrick Steffen Pedersen, Thomas Arildsen, Torben Larsen

Abstract: Signal reconstruction in compressive sensing involves finding a sparse solution that satisfies a set of linear constraints. Several approaches to this problem have been considered in existing reconstruction algorithms. They each provide a trade-off between reconstruction capabilities and required computation time. In an attempt to push the limits for this trade-off, we consider a smoothed l0 norm… ▽ More Signal reconstruction in compressive sensing involves finding a sparse solution that satisfies a set of linear constraints. Several approaches to this problem have been considered in existing reconstruction algorithms. They each provide a trade-off between reconstruction capabilities and required computation time. In an attempt to push the limits for this trade-off, we consider a smoothed l0 norm (SL0) algorithm in a noiseless setup. We argue that using a set of carefully chosen parameters in our proposed adaptive SL0 algorithm may result in significantly better reconstruction capabilities in terms of phase transition while retaining the same required computation time as existing SL0 algorithms. A large set of simulations further support this claim. Simulations even reveal that the theoretical l1 curve may be surpassed in major parts of the phase space. △ Less

Submitted 14 March, 2013; v1 submitted 16 October, 2012; originally announced October 2012.

Comments: 7 pages, 4 figures

arXiv:1110.5176 [pdf, ps, other]

Demodulating Subsampled Direct Sequence Spread Spectrum Signals using Compressive Signal Processing

Authors: Karsten Fyhn, Thomas Arildsen, Torben Larsen, Søren Holdt Jensen

Abstract: We show that to lower the sampling rate in a spread spectrum communication system using Direct Sequence Spread Spectrum (DSSS), compressive signal processing can be applied to demodulate the received signal. This may lead to a decrease in the power consumption or the manufacturing price of wireless receivers using spread spectrum technology. The main novelty of this paper is the discovery that in… ▽ More We show that to lower the sampling rate in a spread spectrum communication system using Direct Sequence Spread Spectrum (DSSS), compressive signal processing can be applied to demodulate the received signal. This may lead to a decrease in the power consumption or the manufacturing price of wireless receivers using spread spectrum technology. The main novelty of this paper is the discovery that in spread spectrum systems it is possible to apply compressive sensing with a much simpler hardware architecture than in other systems, making the implementation both simpler and more energy efficient. Our theoretical work is exemplified with a numerical experiment using the IEEE 802.15.4 standard's 2.4 GHz band specification. The numerical results support our theoretical findings and indicate that compressive sensing may be used successfully in spread spectrum communication systems. The results obtained here may also be applicable in other spread spectrum technologies, such as Code Division Multiple Access (CDMA) systems. △ Less

Submitted 10 October, 2012; v1 submitted 24 October, 2011; originally announced October 2011.

Comments: 5 pages, 2 figures, presented at EUSIPCO 2012

Showing 1–13 of 13 results for author: Arildsen, T