Search | arXiv e-print repository

Mean-Field Microcanonical Gradient Descent

Authors: Marcus Häggbom, Morten Karlsmark, Joakim Andén

Abstract: Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnec… ▽ More Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnecessary amount of entropy in the descent. As a remedy, we propose a mean-field microcanonical gradient descent that samples several weakly coupled data points simultaneously, allowing for better control of the entropy loss while paying little in terms of likelihood fit. We study these models in the context of financial time series, illustrating the improvements on both synthetic and real data. △ Less

Submitted 27 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2102.08463 [pdf, other]

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Authors: Yu-hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, Alex H. Barnett

Abstract: Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy,… ▽ More Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spreading in shared memory. At low accuracies, this gives on-GPU throughputs around $10^9$ nonuniform points per second, and (even including host-device transfer) is typically 4-10$\times$ faster than the latest parallel CPU code FINUFFT (at 28 threads). It is competitive with two established GPU codes, being up to 90$\times$ faster at high accuracy and/or type 1 clustered point distributions. Finally we demonstrate a 5-12$\times$ speedup versus CPU in an X-ray diffraction 3D iterative reconstruction task at $10^{-12}$ accuracy, observing excellent multi-GPU weak scaling up to one rank per GPU. △ Less

Submitted 25 March, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 10 pages, 9 figures

arXiv:2007.10926 [pdf, other]

Time-Frequency Scattering Accurately Models Auditory Similarities Between Instrumental Playing Techniques

Authors: Vincent Lostanlen, Christian El-Hajj, Mathias Rossignol, Grégoire Lafay, Joakim Andén, Mathieu Lagrange

Abstract: Instrumental playing techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies o… ▽ More Instrumental playing techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human subjects to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time--frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of $99.0\%\pm1$. An ablation study demonstrates that removing either the joint time--frequency scattering transform or the metric learning algorithm noticeably degrades performance. △ Less

Submitted 10 November, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: 32 pages, 5 figures. To appear in EURASIP Journal on Audio, Speech, and Music Processing (JASMP)

arXiv:1908.03454 [pdf, other]

Bias and variance reduction and denoising for CTF Estimation

Authors: Ayelet Heimowitz, Joakim Andén, Amit Singer

Abstract: When using an electron microscope for imaging of particles embedded in vitreous ice, the objective lens will inevitably corrupt the projection images. This corruption manifests as a band-pass filter on the micrograph. In addition, it causes the phase of several frequency bands to be flipped and distorts frequency bands. As a precursor to compensating for this distortion, the corrupting point sprea… ▽ More When using an electron microscope for imaging of particles embedded in vitreous ice, the objective lens will inevitably corrupt the projection images. This corruption manifests as a band-pass filter on the micrograph. In addition, it causes the phase of several frequency bands to be flipped and distorts frequency bands. As a precursor to compensating for this distortion, the corrupting point spread function, which is termed the contrast transfer function (CTF) in reciprocal space, must be estimated. In this paper, we will present a novel method for CTF estimation. Our method is based on the multi-taper method for power spectral density estimation, which aims to reduce the bias and variance of the estimator. Furthermore, we use known properties of the CTF and of the background of the power spectrum to increase the accuracy of our estimation. We will show that the resulting estimates capture the zero-crossings of the CTF in the low-mid frequency range. △ Less

Submitted 28 January, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

arXiv:1907.01898 [pdf, other]

doi 10.1088/1361-6420/ab4f55

Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes

Authors: Amit Moscovich, Amit Halevi, Joakim Andén, Amit Singer

Abstract: Single-particle electron cryomicroscopy is an essential tool for high-resolution 3D reconstruction of proteins and other biological macromolecules. An important challenge in cryo-EM is the reconstruction of non-rigid molecules with parts that move and deform. Traditional reconstruction methods fail in these cases, resulting in smeared reconstructions of the moving parts. This poses a major obstacl… ▽ More Single-particle electron cryomicroscopy is an essential tool for high-resolution 3D reconstruction of proteins and other biological macromolecules. An important challenge in cryo-EM is the reconstruction of non-rigid molecules with parts that move and deform. Traditional reconstruction methods fail in these cases, resulting in smeared reconstructions of the moving parts. This poses a major obstacle for structural biologists, who need high-resolution reconstructions of entire macromolecules, moving parts included. To address this challenge, we present a new method for the reconstruction of macromolecules exhibiting continuous heterogeneity. The proposed method uses projection images from multiple viewing directions to construct a graph Laplacian through which the manifold of three-dimensional conformations is analyzed. The 3D molecular structures are then expanded in a basis of Laplacian eigenvectors, using a novel generalized tomographic reconstruction algorithm to compute the expansion coefficients. These coefficients, which we name spectral volumes, provide a high-resolution visualization of the molecular dynamics. We provide a theoretical analysis and evaluate the method empirically on several simulated data sets. △ Less

Submitted 26 September, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: 33 pages, 10 figures

MSC Class: 62P10; 65R32; 94A08 ACM Class: I.4.5; I.5.4; J.3

Journal ref: Inverse Problems, 36:2 (2020)

arXiv:1907.01589 [pdf, other]

doi 10.1088/1361-6420/ab5ede

Hyper-Molecules: on the Representation and Recovery of Dynamical Structures, with Application to Flexible Macro-Molecular Structures in Cryo-EM

Authors: Roy R. Lederman, Joakim Andén, Amit Singer

Abstract: Cryo-electron microscopy (cryo-EM), the subject of the 2017 Nobel Prize in Chemistry, is a technology for determining the 3-D structure of macromolecules from many noisy 2-D projections of instances of these macromolecules, whose orientations and positions are unknown. The molecular structures are not rigid objects, but flexible objects involved in dynamical processes. The different conformations… ▽ More Cryo-electron microscopy (cryo-EM), the subject of the 2017 Nobel Prize in Chemistry, is a technology for determining the 3-D structure of macromolecules from many noisy 2-D projections of instances of these macromolecules, whose orientations and positions are unknown. The molecular structures are not rigid objects, but flexible objects involved in dynamical processes. The different conformations are exhibited by different instances of the macromolecule observed in a cryo-EM experiment, each of which is recorded as a particle image. The range of conformations and the conformation of each particle are not known a priori; one of the great promises of cryo-EM is to map this conformation space. Remarkable progress has been made in determining rigid structures from homogeneous samples of molecules in spite of the unknown orientation of each particle image and significant progress has been made in recovering a few distinct states from mixtures of rather distinct conformations, but more complex heterogeneous samples remain a major challenge. We introduce the ``hyper-molecule'' framework for modeling structures across different states of heterogeneous molecules, including continuums of states. The key idea behind this framework is representing heterogeneous macromolecules as high-dimensional objects, with the additional dimensions representing the conformation space. This idea is then refined to model properties such as localized heterogeneity. In addition, we introduce an algorithmic framework for recovering such maps of heterogeneous objects from experimental data using a Bayesian formulation of the problem and Markov chain Monte Carlo (MCMC) algorithms to address the computational challenges in recovering these high dimensional hyper-molecules. We demonstrate these ideas in a prototype applied to synthetic data. △ Less

Submitted 2 July, 2019; originally announced July 2019.

arXiv:1812.11214 [pdf, ps, other]

Kymatio: Scattering Transforms in Python

Authors: Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Muawiz Chaudhary, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, Michael Eickenberg

Abstract: The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU… ▽ More The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU), offering a considerable speed up over CPU implementations. The package also has a small memory footprint, resulting inefficient memory usage. The source code, documentation, and examples are available undera BSD license at https://www.kymat.io/ △ Less

Submitted 31 May, 2022; v1 submitted 28 December, 2018; originally announced December 2018.

arXiv:1812.03225 [pdf, other]

Multitaper estimation on arbitrary domains

Authors: Joakim Andén, José Luis Romero

Abstract: Multitaper estimators have enjoyed significant success in estimating spectral densities from finite samples using as tapers Slepian functions defined on the acquisition domain. Unfortunately, the numerical calculation of these Slepian tapers is only tractable for certain symmetric domains, such as rectangles or disks. In addition, no performance bounds are currently available for the mean squared… ▽ More Multitaper estimators have enjoyed significant success in estimating spectral densities from finite samples using as tapers Slepian functions defined on the acquisition domain. Unfortunately, the numerical calculation of these Slepian tapers is only tractable for certain symmetric domains, such as rectangles or disks. In addition, no performance bounds are currently available for the mean squared error of the spectral density estimate. This situation is inadequate for applications such as cryo-electron microscopy, where noise models must be estimated from irregular domains with small sample sizes. We show that the multitaper estimator only depends on the linear space spanned by the tapers. As a result, Slepian tapers may be replaced by proxy tapers spanning the same subspace (validating the common practice of using partially converged solutions to the Slepian eigenproblem as tapers). These proxies may consequently be calculated using standard numerical algorithms for block diagonalization. We also prove a set of performance bounds for multitaper estimators on arbitrary domains. The method is demonstrated on synthetic and experimental datasets from cryo-electron microscopy, where it reduces mean squared error by a factor of two or more compared to traditional methods. △ Less

Submitted 18 June, 2020; v1 submitted 7 December, 2018; originally announced December 2018.

Comments: 28 pages, 11 figures

Journal ref: SIAM Journal on Imaging Sciences, 13(3):1565-1594, 2020

arXiv:1808.09730 [pdf, other]

Extended playing techniques: The next milestone in musical instrument recognition

Authors: Vincent Lostanlen, Joakim Andén, Mathieu Lagrange

Abstract: The expressive variability in producing a musical note conveys information essential to the modeling of orchestration and style. As such, it plays a crucial role in computer-assisted browsing of massive digital music corpora. Yet, although the automatic recognition of a musical instrument from the recording of a single "ordinary" note is considered a solved problem, automatic identification of ins… ▽ More The expressive variability in producing a musical note conveys information essential to the modeling of orchestration and style. As such, it plays a crucial role in computer-assisted browsing of massive digital music corpora. Yet, although the automatic recognition of a musical instrument from the recording of a single "ordinary" note is considered a solved problem, automatic identification of instrumental playing technique (IPT) remains largely underdeveloped. We benchmark machine listening systems for query-by-example browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets of instrument, mute, and technique. We identify and discuss three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7% for instrument recognition (baseline at 89.0%) and of 61.0% for IPT recognition (baseline at 44.5%). We interpret this gain through a qualitative assessment of practical usability and visualization using nonlinear dimensionality reduction. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: 10 pages, 9 figures. The source code to reproduce the experiments of this paper is made available at: https://www.github.com/mathieulagrange/dlfm2018

Journal ref: Proceedings of the 5th International Workshop on Digital Libraries for Musicology (DLfM), Paris, France, September 2018. Published by ACM's International Conference Proceedings Series (ICPS)

arXiv:1807.08869 [pdf, other]

doi 10.1109/TSP.2019.2918992

Joint Time-Frequency Scattering

Authors: Joakim Andén, Vincent Lostanlen, Stéphane Mallat

Abstract: In time series classification and regression, signals are typically mapped into some intermediate representation used for constructing models. Since the underlying task is often insensitive to time shifts, these representations are required to be time-shift invariant. We introduce the joint time-frequency scattering transform, a time-shift invariant representation which characterizes the multiscal… ▽ More In time series classification and regression, signals are typically mapped into some intermediate representation used for constructing models. Since the underlying task is often insensitive to time shifts, these representations are required to be time-shift invariant. We introduce the joint time-frequency scattering transform, a time-shift invariant representation which characterizes the multiscale energy distribution of a signal in time and frequency. It is computed through wavelet convolutions and modulus non-linearities and may therefore be implemented as a deep convolutional neural network whose filters are not learned but calculated from wavelets. We consider the progression from mel-spectrograms to time scattering and joint time-frequency scattering transforms, illustrating the relationship between increased discriminability and refinements of convolutional network architectures. The suitability of the joint time-frequency scattering transform for time-shift invariant characterization of time series is demonstrated through applications to chirp signals and audio synthesis experiments. The proposed transform also obtains state-of-the-art results on several audio classification tasks, outperforming time scattering transforms and achieving accuracies comparable to those of fully learned networks. △ Less

Submitted 12 July, 2019; v1 submitted 23 July, 2018; originally announced July 2018.

Comments: 14 pages, 10 figures

Journal ref: IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3704-3718, July 15, 2019

arXiv:1802.00469 [pdf, other]

APPLE Picker: Automatic Particle Picking, a Low-Effort Cryo-EM Framework

Authors: Ayelet Heimowitz, Joakim Andén, Amit Singer

Abstract: Particle picking is a crucial first step in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM). Selecting particles from the micrographs is difficult especially for small particles with low contrast. As high-resolution reconstruction typically requires hundreds of thousands of particles, manually picking that many particles is often too time-consuming. While semi-auto… ▽ More Particle picking is a crucial first step in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM). Selecting particles from the micrographs is difficult especially for small particles with low contrast. As high-resolution reconstruction typically requires hundreds of thousands of particles, manually picking that many particles is often too time-consuming. While semi-automated particle picking is currently a popular approach, it may suffer from introducing manual bias into the selection process. In addition, semi-automated particle picking is still somewhat time-consuming. This paper presents the APPLE (Automatic Particle Picking with Low user Effort) picker, a simple and novel approach for fast, accurate, and fully automatic particle picking. While our approach was inspired by template matching, it is completely template-free. This approach is evaluated on publicly available datasets containing micrographs of $β$-galactosidase and keyhole limpet hemocyanin projections. △ Less

Submitted 14 June, 2018; v1 submitted 1 February, 2018; originally announced February 2018.

Comments: 18 pages, 14 figures

arXiv:1512.02125 [pdf, other]

doi 10.1109/MLSP.2015.7324385

Joint Time-Frequency Scattering for Audio Classification

Authors: Joakim Andén, Vincent Lostanlen, Stéphane Mallat

Abstract: We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and fr… ▽ More We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations. State-of-the-art results are achieved for signal reconstruction and phone segment classification on the TIMIT dataset. △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: 6 pages, 2 figures in IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), 2015. Sept. 17-20. Boston, USA

arXiv:1412.0985 [pdf, other]

doi 10.1109/ISBI.2015.7163849

Covariance estimation using conjugate gradient for 3D classification in Cryo-EM

Authors: Joakim Andén, Eugene Katsevich, Amit Singer

Abstract: Classifying structural variability in noisy projections of biological macromolecules is a central problem in Cryo-EM. In this work, we build on a previous method for estimating the covariance matrix of the three-dimensional structure present in the molecules being imaged. Our proposed method allows for incorporation of contrast transfer function and non-uniform distribution of viewing angles, maki… ▽ More Classifying structural variability in noisy projections of biological macromolecules is a central problem in Cryo-EM. In this work, we build on a previous method for estimating the covariance matrix of the three-dimensional structure present in the molecules being imaged. Our proposed method allows for incorporation of contrast transfer function and non-uniform distribution of viewing angles, making it more suitable for real-world data. We evaluate its performance on a synthetic dataset and an experimental dataset obtained by imaging a 70S ribosome complex. △ Less

Submitted 11 February, 2015; v1 submitted 2 December, 2014; originally announced December 2014.

arXiv:1304.6763 [pdf, other]

doi 10.1109/TSP.2014.2326991

Deep Scattering Spectrum

Authors: Joakim Andén, Stéphane Mallat

Abstract: A scattering transform defines a locally translation invariant representation which is stable to time-war** deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation.… ▽ More A scattering transform defines a locally translation invariant representation which is stable to time-war** deformations. It extends MFCC representations by computing modulation spectrum coefficients of multiple orders, through cascades of wavelet convolutions and modulus operators. Second-order scattering coefficients characterize transient phenomena such as attacks and amplitude modulation. A frequency transposition invariant representation is obtained by applying a scattering transform along log-frequency. State-the-of-art classification results are obtained for musical genre and phone classification on GTZAN and TIMIT databases, respectively. △ Less

Submitted 10 January, 2014; v1 submitted 24 April, 2013; originally announced April 2013.

Showing 1–14 of 14 results for author: andén, J