Search | arXiv e-print repository

doi 10.1145/3447958

Fast Matching Pursuit with Multi-Gabor Dictionaries

Authors: Zdeněk Průša, Nicki Holighaus, Peter Balazs

Abstract: Finding the best K-sparse approximation of a signal in a redundant dictionary is an NP-hard problem. Suboptimal greedy matching pursuit (MP) algorithms are generally used for this task. In this work, we present an acceleration technique and an implementation of the matching pursuit algorithm acting on a multi-Gabor dictionary, i.e., a concatenation of several Gabor-type time-frequency dictionaries… ▽ More Finding the best K-sparse approximation of a signal in a redundant dictionary is an NP-hard problem. Suboptimal greedy matching pursuit (MP) algorithms are generally used for this task. In this work, we present an acceleration technique and an implementation of the matching pursuit algorithm acting on a multi-Gabor dictionary, i.e., a concatenation of several Gabor-type time-frequency dictionaries, each of which consisting of translations and modulations of a possibly different window and time and frequency shift parameters. The technique is based on pre-computing and thresholding inner products between atoms and on updating the residual directly in the coefficient domain, i.e., without the round-trip to the signal domain. Since the proposed acceleration technique involves an approximate update step, we provide theoretical and experimental results illustrating the convergence of the resulting algorithm. The implementation is written in C (compatible with C99 and C++11) and we also provide Matlab and GNU Octave interfaces. For some settings, the implementation is up to 70 times faster than the standard Matching Pursuit Toolkit (MPTK). △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2202.07498 [pdf, other]

doi 10.23919/EUSIPCO.2017.8081342

Non-iterative Filter Bank Phase (Re)Construction

Authors: Zdeněk Průša, Nicki Holighaus

Abstract: Signal reconstruction from magnitude-only measurements presents a long-standing problem in signal processing. In this contribution, we propose a phase (re)construction method for filter banks with uniform decimation and controlled frequency variation. The suggested procedure extends the recently introduced phase-gradient heap integration and relies on a phase-magnitude relationship for filter bank… ▽ More Signal reconstruction from magnitude-only measurements presents a long-standing problem in signal processing. In this contribution, we propose a phase (re)construction method for filter banks with uniform decimation and controlled frequency variation. The suggested procedure extends the recently introduced phase-gradient heap integration and relies on a phase-magnitude relationship for filter bank coefficients obtained from Gaussian filters. Admissible filter banks are modeled as the discretization of certain generalized translation-invariant systems, for which we derive the phase-magnitude relationship explicitly. The implementation for discrete signals is described and the performance of the algorithm is evaluated on a range of real and synthetic signals. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.07484 [pdf, other]

Phase-Based Signal Representations for Scattering

Authors: Daniel Haider, Peter Balazs, Nicki Holighaus

Abstract: The scattering transform is a non-linear signal representation method based on cascaded wavelet transform magnitudes. In this paper we introduce phase scattering, a novel approach where we use phase derivatives in a scattering procedure. We first revisit phase-related concepts for representing time-frequency information of audio signals, in particular, the partial derivatives of the phase in the t… ▽ More The scattering transform is a non-linear signal representation method based on cascaded wavelet transform magnitudes. In this paper we introduce phase scattering, a novel approach where we use phase derivatives in a scattering procedure. We first revisit phase-related concepts for representing time-frequency information of audio signals, in particular, the partial derivatives of the phase in the time-frequency domain. By putting analytical and numerical results in a new light, we set the basis to extend the phase-based representations to higher orders by means of a scattering transform, which leads to well localized signal representations of large-scale structures. All the ideas are introduced in a general way and then applied using the STFT. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.07479 [pdf, ps, other]

Audio Inpainting via $\ell_1$-Minimization and Dictionary Learning

Authors: Shristi Rajbamshi, Georg Tauböck, Peter Balazs, Nicki Holighaus

Abstract: Audio inpainting refers to signal processing techniques that aim at restoring missing or corrupted consecutive samples in audio signals. Prior works have shown that $\ell_1$- minimization with appropriate weighting is capable of solving audio inpainting problems, both for the analysis and the synthesis models. These models assume that audio signals are sparse with respect to some redundant diction… ▽ More Audio inpainting refers to signal processing techniques that aim at restoring missing or corrupted consecutive samples in audio signals. Prior works have shown that $\ell_1$- minimization with appropriate weighting is capable of solving audio inpainting problems, both for the analysis and the synthesis models. These models assume that audio signals are sparse with respect to some redundant dictionary and exploit that sparsity for inpainting purposes. Remaining within the sparsity framework, we utilize dictionary learning to further increase the sparsity and combine it with weighted $\ell_1$-minimization adapted for audio inpainting to compensate for the loss of energy within the gap after restoration. Our experiments demonstrate that our approach is superior in terms of signal-to-distortion ratio (SDR) and objective difference grade (ODG) compared with its original counterpart. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.07382 [pdf, other]

Phase Vocoder Done Right

Authors: Zdenek Prusa, Nicki Holighaus

Abstract: The phase vocoder (PV) is a widely spread technique for processing audio signals. It employs a short-time Fourier transform (STFT) analysis-modify-synthesis loop and is typically used for time-scaling of signals by means of using different time steps for STFT analysis and synthesis. The main challenge of PV used for that purpose is the correction of the STFT phase. In this paper, we introduce a no… ▽ More The phase vocoder (PV) is a widely spread technique for processing audio signals. It employs a short-time Fourier transform (STFT) analysis-modify-synthesis loop and is typically used for time-scaling of signals by means of using different time steps for STFT analysis and synthesis. The main challenge of PV used for that purpose is the correction of the STFT phase. In this paper, we introduce a novel method for phase correction based on phase gradient estimation and its integration. The method does not require explicit peak picking and tracking nor does it require detection of transients and their separate treatment. Yet, the method does not suffer from the typical phase vocoder artifacts even for extreme time stretching factors. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2106.05148 [pdf, other]

doi 10.1109/TSP.2021.3088581

Time-Frequency Phase Retrieval for Audio -- The Effect of Transform Parameters

Authors: Andrés Marafioti, Nicki Holighaus, Piotr Majdak

Abstract: In audio processing applications, phase retrieval (PR) is often performed from the magnitude of short-time Fourier transform (STFT) coefficients. Although PR performance has been observed to depend on the considered STFT parameters and audio data, the extent of this dependence has not been systematically evaluated yet. To address this, we studied the performance of three PR algorithms for various… ▽ More In audio processing applications, phase retrieval (PR) is often performed from the magnitude of short-time Fourier transform (STFT) coefficients. Although PR performance has been observed to depend on the considered STFT parameters and audio data, the extent of this dependence has not been systematically evaluated yet. To address this, we studied the performance of three PR algorithms for various types of audio content and various STFT parameters such as redundancy, time-frequency ratio, and the type of window. The quality of PR was studied in terms of objective difference grade and signal-to-noise ratio of the STFT magnitude, to provide auditory- and signal-based quality assessments. Our results show that PR quality improved with increasing redundancy, with a strong relevance of the time-frequency ratio. The effect of the audio content was smaller but still observable. The effect of the window was only significant for one of the PR algorithms. Interestingly, for a good PR quality, each of the three algorithms required a different set of parameters, demonstrating the relevance of individual parameter sets for a fair comparison across PR algorithms. Based on these results, we developed guidelines for optimizing STFT parameters for a given application. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted for publication as a regular paper in the IEEE Transactions on Signal Processing

arXiv:2005.05032 [pdf, other]

doi 10.1109/JSTSP.2020.3037506

GACELA -- A generative adversarial context encoder for long audio inpainting

Authors: Andres Marafioti, Piotr Majdak, Nicki Holighaus, Nathanaël Perraudin

Abstract: We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gap… ▽ More We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the conditional GAN. This addresses the inherent multi-modality of audio inpainting at such long gaps and provides the option of user-defined inpainting. GACELA was tested in listening tests on music signals of varying complexity and gap durations ranging from 375~ms to 1500~ms. While our subjects were often able to detect the inpaintings, the severity of the artifacts decreased from unacceptable to mildly disturbing. GACELA represents a framework capable to integrate future improvements such as processing of more auditory-related features or more explicit musical features. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 1, pp. 120-131, Jan. 2021

arXiv:1902.04072 [pdf, other]

Adversarial Generation of Time-Frequency Features with application in audio synthesis

Authors: Andrés Marafioti, Nicki Holighaus, Nathanaël Perraudin, Piotr Majdak

Abstract: Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from neurally generated invertible TF features still st… ▽ More Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from neurally generated invertible TF features still struggle to produce audio at satisfying quality. In this article, focusing on the short-time Fourier transform, we discuss the challenges that arise in audio synthesis based on generated invertible TF features and how to overcome them. We demonstrate the potential of deliberate generative TF modeling by training a generative adversarial network (GAN) on short-time Fourier features. We show that by applying our guidelines, our TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks. △ Less

Submitted 16 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

Comments: Accepted for publication at ICML 2019

arXiv:1810.12138 [pdf, other]

Audio inpainting of music by means of neural networks

Authors: Andrés Marafioti, Nicki Holighaus, Piotr Majdak, Nathanaël Perraudin

Abstract: We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps. The input to the DNN was the context, i.e., the sig… ▽ More We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps. The input to the DNN was the context, i.e., the signal surrounding the gap, transformed into time-frequency (TF) coefficients. Our results were compared to those obtained from a reference method based on linear predictive coding (LPC). For music, our DNN significantly outperformed the reference method, demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music. △ Less

Submitted 18 February, 2022; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Presented at the 146th AES Convention [arXiv:1810.12138v2]. For the journal version, published in published in IEEE TASLP, see [arXiv:1810.12138v2]

arXiv:1611.00966 [pdf, other]

doi 10.1007/978-3-319-54711-4_10

Frame Theory for Signal Processing in Psychoacoustics

Authors: Peter Balazs, Nicki Holighaus, Thibaud Necciari, Diana Stoeva

Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame th… ▽ More This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field. △ Less

Submitted 3 November, 2016; originally announced November 2016.

Journal ref: In: Balan R., Benedetto J., Czaja W., Dellatorre M., Okoudjou K. (eds) Excursions in Harmonic Analysis, Vol. 5. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham, 2017, 225-268

arXiv:1607.06667 [pdf, other]

Inpainting of long audio segments with similarity graphs

Authors: Nathanael Perraudin, Nicki Holighaus, Piotr Majdak, Peter Balazs

Abstract: We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the… ▽ More We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap, i.e. the lost or distorted signal region. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals. △ Less

Submitted 23 February, 2018; v1 submitted 22 July, 2016; originally announced July 2016.

arXiv:1601.06652 [pdf, ps, other]

A Perceptually Motivated Filter Bank with Perfect Reconstruction for Audio Signal Processing

Authors: Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdenek Prusa

Abstract: Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. To approximate the auditory frequency resolution in the signal chain, some applications rely on perceptually motivated FBs, the gammatone FB being a popular example. However, most perceptually motivated FBs only allow partial signal reconstruction at high redundancies and/or do not have good resistanc… ▽ More Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. To approximate the auditory frequency resolution in the signal chain, some applications rely on perceptually motivated FBs, the gammatone FB being a popular example. However, most perceptually motivated FBs only allow partial signal reconstruction at high redundancies and/or do not have good resistance to sub-channel processing. This paper introduces an oversampled perceptually motivated FB enabling perfect reconstruction, efficient FB design, and adaptable redundancy. The filters are directly constructed in the frequency domain and linearly distributed on a perceptual frequency scale (e.g. ERB, Bark, or Mel scale). The proposed design allows for various filter shapes, uniform or non-uniform FB setting, and large down-sampling factors. For redundancies $\geq$ 3 perfect reconstruction is achieved by computing the canonical dual FB analytically. For lower redundancies perfect reconstruction is achieved using an iterative method. Experiments show performance improvements of the proposed approach when compared to the gammatone FB in terms of reconstruction error and resistance to sub-channel processing, especially at low redundancies. △ Less

Submitted 25 January, 2016; originally announced January 2016.

arXiv:1311.0897 [pdf, other]

Spectrum-Adapted Tight Graph Wavelet and Vertex-Frequency Frames

Authors: David I Shuman, Christoph Wiesmeyr, Nicki Holighaus, Pierre Vandergheynst

Abstract: We consider the problem of designing spectral graph filters for the construction of dictionaries of atoms that can be used to efficiently represent signals residing on weighted graphs. While the filters used in previous spectral graph wavelet constructions are only adapted to the length of the spectrum, the filters proposed in this paper are adapted to the distribution of graph Laplacian eigenvalu… ▽ More We consider the problem of designing spectral graph filters for the construction of dictionaries of atoms that can be used to efficiently represent signals residing on weighted graphs. While the filters used in previous spectral graph wavelet constructions are only adapted to the length of the spectrum, the filters proposed in this paper are adapted to the distribution of graph Laplacian eigenvalues, and therefore lead to atoms with better discriminatory power. Our approach is to first characterize a family of systems of uniformly translated kernels in the graph spectral domain that give rise to tight frames of atoms generated via generalized translation on the graph. We then warp the uniform translates with a function that approximates the cumulative spectral density function of the graph Laplacian eigenvalues. We use this approach to construct computationally efficient, spectrum-adapted, tight vertex-frequency and graph wavelet frames. We give numerous examples of the resulting spectrum-adapted graph filters, and also present an illustrative example of vertex-frequency analysis using the proposed construction. △ Less

Submitted 4 November, 2013; originally announced November 2013.

Showing 1–13 of 13 results for author: Holighaus, N