-
G-invariant diffusion maps
Authors:
Eitan Rosen,
Xiuyuan Cheng,
Yoel Shkolnisky
Abstract:
The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant…
▽ More
The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant graph Laplacian, introduced in a previous work of the authors, admits eigenfunctions in the form of tensor products between the elements of the irreducible unitary representations of the group and eigenvectors of certain matrices. We employ these eigenfunctions to derive diffusion maps that intrinsically account for the group action on the data. In particular, we construct both equivariant and invariant embeddings which can be used naturally to cluster and align the data points. We demonstrate the effectiveness of our construction with simulated data.
△ Less
Submitted 25 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
The G-invariant graph Laplacian
Authors:
Eitan Rosen,
Paulina Hoyos,
Xiuyuan Cheng,
Joe Kileel,
Yoel Shkolnisky
Abstract:
Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs…
▽ More
Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of G on the data set. We deem the latter construction the ``G-invariant Graph Laplacian'' (G-GL). We show that the G-GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the G-GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).
△ Less
Submitted 28 June, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.
-
ASOCEM: Automatic Segmentation Of Contaminations in cryo-EM
Authors:
Amitay Eldar,
Ido Amos,
Yoel Shkolnisky
Abstract:
Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to…
▽ More
Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to detect and segment contaminations, which requires as an input only the approximated particle size. In particular, it does not require any parameter tuning nor manual intervention. Our method is based on the observation that the statistical distribution of contaminated regions is different from that of the rest of the micrograph. This nonrestrictive assumption allows to automatically detect various types of contaminations, from the carbon edges of the supporting grid to high contrast blobs of different sizes. We demonstrate the efficiency of our algorithm using various experimental data sets containing various types of contaminations. ASOCEM is integrated as part of the KLT picker \cite{ELDAR2020107473} and is available at \url{https://github.com/ShkolniskyLab/kltpicker2}.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
A Perturbation-Based Kernel Approximation Framework
Authors:
Roy Mitz,
Yoel Shkolnisky
Abstract:
Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from…
▽ More
Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from classical perturbation theory. We provide an error analysis for this framework, and prove that in fact, it generalizes the Nystr{ö}m method and several of its variants. Furthermore, we show that our framework gives rise to new kernel approximation schemes, that can be tuned to take advantage of the structure of the approximated kernel matrix. We support our theoretical results numerically and demonstrate the advantages of our approximation framework on both synthetic and real-world data.
△ Less
Submitted 23 May, 2022; v1 submitted 7 September, 2020;
originally announced September 2020.
-
ROIPCA: An online memory-restricted PCA algorithm based on rank-one updates
Authors:
Roy Mitz,
Yoel Shkolnisky
Abstract:
Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fRO…
▽ More
Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fROIPCA is faster and has comparable accuracy. We show the relation between fROIPCA and an existing popular gradient algorithm for online PCA, and in particular, prove that fROIPCA is in fact a gradient algorithm with an optimal learning rate. We demonstrate numerically the advantages of our algorithms over existing state-of-the-art algorithms in terms of accuracy and runtime.
△ Less
Submitted 7 June, 2023; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Multi-reference factor analysis: low-rank covariance estimation under unknown translations
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying si…
▽ More
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the $\textit{power spectrum}$ and the $\textit{trispectrum}$. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r<\sqrt{L}$. Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of $r$ is required, and prove the procedure's statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise.
△ Less
Submitted 21 September, 2020; v1 submitted 1 June, 2019;
originally announced June 2019.
-
Rank-one Multi-Reference Factor Analysis
Authors:
Yariv Aizenbud,
Boris Landa,
Yoel Shkolnisky
Abstract:
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy ob…
▽ More
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal's length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings.
△ Less
Submitted 4 June, 2019; v1 submitted 29 May, 2019;
originally announced May 2019.
-
A common lines approach for ab-initio modeling of cyclically-symmetric molecules
Authors:
Gabi Pragier,
Yoel Shkolnisky
Abstract:
One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes…
▽ More
One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes self common lines which induce identical lines within the Fourier transform of each projection-image. We show that the location of self common lines admits quite a few favorable geometrical constraints, thus allowing to detect them even in a noisy setting. In addition, for molecules with higher order rotational symmetry, our proposed method exploits the fact that there exist numerous common lines between any two Fourier transformed projection-images of such molecules, thus allowing to determine their relative orientation even under high levels of noise. The efficacy of our proposed method is demonstrated using numerical experiments conducted on simulated and experimental data.
△ Less
Submitted 24 June, 2019; v1 submitted 20 January, 2019;
originally announced January 2019.
-
The steerable graph Laplacian and its application to filtering image data-sets
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and…
▽ More
In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and rotation-invariant framework for processing such datasets, we introduce a graph Laplacian (GL)-like operator over the dataset, termed ${\textit{steerable graph Laplacian}}$. Essentially, the steerable GL extends the standard GL by accounting for all (infinitely-many) planar rotations of all images. As it turns out, similarly to the standard GL, a properly normalized steerable GL converges to the Laplace-Beltrami operator on the low-dimensional manifold. However, the steerable GL admits an improved convergence rate compared to the GL, where the improved convergence behaves as if the intrinsic dimension of the underlying manifold is lower by one. Moreover, it is shown that the steerable GL admits eigenfunctions of the form of Fourier modes (along the orbits of the images' rotations) multiplied by eigenvectors of certain matrices, which can be computed efficiently by the FFT. For image datasets corrupted by noise, we employ a subset of these eigenfunctions to "filter" the dataset via a Fourier-like filtering scheme, essentially using all images and their rotations simultaneously. We demonstrate our filtering framework by de-noising simulated single-particle cryo-EM image datasets.
△ Less
Submitted 7 August, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
A max-cut approach to heterogeneity in cryo-electron microscopy
Authors:
Yariv Aizenbud,
Yoel Shkolnisky
Abstract:
The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some clas…
▽ More
The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some classification procedure to identify homogeneous groups within the data, so that the images in each group correspond to the same underlying structure. The fundamental importance of the heterogeneity problem in cryo-electron microscopy has drawn many research efforts, and resulted in significant progress in classification algorithms for heterogeneous data sets. While these algorithms are extremely useful and effective in practice, they lack rigorous mathematical analysis and performance guarantees.
In this paper, we attempt to make the first steps towards rigorous mathematical analysis of the heterogeneity problem in cryo-electron microscopy. To that end, we present an algorithm for processing heterogeneous data sets, and prove accuracy and stability bounds for it. We also suggest an extension of this algorithm that combines the classification and reconstruction steps. We demonstrate it on simulated data, and compare its performance to the state-of-the-art algorithm in RELION.
△ Less
Submitted 3 October, 2019; v1 submitted 5 September, 2016;
originally announced September 2016.
-
Steerable Principal Components for Space-Frequency Localized Images
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prol…
▽ More
This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prolate Spheroidal Wave Functions (PSWFs), where the expansion coefficients are evaluated using a specially designed numerical integration scheme. Then, the expansion coefficients are used to construct a rotationally-invariant covariance matrix which admits a block-diagonal structure, and the eigen-decomposition of its blocks provides us with the desired steerable principal components. The proposed method is shown to be faster then existing methods, while providing appropriate error bounds which guarantee its accuracy.
△ Less
Submitted 9 August, 2018; v1 submitted 9 August, 2016;
originally announced August 2016.
-
Multi-View Kernel Consensus For Data Analysis
Authors:
Moshe Salhov,
Ofir Lindenbaum,
Yariv Aizenbud,
Avi Silberschatz,
Yoel Shkolnisky,
Amir Averbuch
Abstract:
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe…
▽ More
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides.
△ Less
Submitted 29 January, 2019; v1 submitted 28 June, 2016;
originally announced June 2016.
-
Machine olfaction using time scattering of sensor multiresolution graphs
Authors:
Leonid Gugel,
Yoel Shkolnisky,
Shai Dekel
Abstract:
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to cr…
▽ More
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies.
△ Less
Submitted 13 February, 2016;
originally announced February 2016.
-
An algorithm for improving Non-Local Means operators via low-rank approximation
Authors:
Victor May,
Yosi Keller,
Nir Sharon,
Yoel Shkolnisky
Abstract:
We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev…
▽ More
We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev polynomials and is demonstrated on the application of natural images denoising. For this application, we provide a comprehensive comparison of our method with leading denoising methods.
△ Less
Submitted 20 November, 2014;
originally announced December 2014.
-
Fast Steerable Principal Component Analysis
Authors:
Zhizhen Zhao,
Yoel Shkolnisky,
Amit Singer
Abstract:
Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a da…
▽ More
Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a dataset consisting of $n$ images of size $L \times L$ pixels, the computational complexity of our algorithm is $O(nL^3 + L^4)$, while existing algorithms take $O(nL^4)$. The new algorithm computes the expansion coefficients of the images in a Fourier-Bessel basis efficiently using the non-uniform fast Fourier transform. We compare the accuracy and efficiency of the new algorithm with traditional PCA and existing algorithms for steerable PCA.
△ Less
Submitted 15 December, 2015; v1 submitted 1 December, 2014;
originally announced December 2014.