Search | arXiv e-print repository

G-invariant diffusion maps

Authors: Eitan Rosen, Xiuyuan Cheng, Yoel Shkolnisky

Abstract: The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant… ▽ More The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant graph Laplacian, introduced in a previous work of the authors, admits eigenfunctions in the form of tensor products between the elements of the irreducible unitary representations of the group and eigenvectors of certain matrices. We employ these eigenfunctions to derive diffusion maps that intrinsically account for the group action on the data. In particular, we construct both equivariant and invariant embeddings which can be used naturally to cluster and align the data points. We demonstrate the effectiveness of our construction with simulated data. △ Less

Submitted 25 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2303.17001 [pdf, other]

The G-invariant graph Laplacian

Authors: Eitan Rosen, Paulina Hoyos, Xiuyuan Cheng, Joe Kileel, Yoel Shkolnisky

Abstract: Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs… ▽ More Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of G on the data set. We deem the latter construction the ``G-invariant Graph Laplacian'' (G-GL). We show that the G-GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the G-GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2). △ Less

Submitted 28 June, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

arXiv:2201.06978 [pdf, other]

ASOCEM: Automatic Segmentation Of Contaminations in cryo-EM

Authors: Amitay Eldar, Ido Amos, Yoel Shkolnisky

Abstract: Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to… ▽ More Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to detect and segment contaminations, which requires as an input only the approximated particle size. In particular, it does not require any parameter tuning nor manual intervention. Our method is based on the observation that the statistical distribution of contaminated regions is different from that of the rest of the micrograph. This nonrestrictive assumption allows to automatically detect various types of contaminations, from the carbon edges of the supporting grid to high contrast blobs of different sizes. We demonstrate the efficiency of our algorithm using various experimental data sets containing various types of contaminations. ASOCEM is integrated as part of the KLT picker \cite{ELDAR2020107473} and is available at \url{https://github.com/ShkolniskyLab/kltpicker2}. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2009.02955 [pdf, other]

A Perturbation-Based Kernel Approximation Framework

Authors: Roy Mitz, Yoel Shkolnisky

Abstract: Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from… ▽ More Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from classical perturbation theory. We provide an error analysis for this framework, and prove that in fact, it generalizes the Nystr{ö}m method and several of its variants. Furthermore, we show that our framework gives rise to new kernel approximation schemes, that can be tuned to take advantage of the structure of the approximated kernel matrix. We support our theoretical results numerically and demonstrate the advantages of our approximation framework on both synthetic and real-world data. △ Less

Submitted 23 May, 2022; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: 25 pages, 6 figures

arXiv:1911.11049 [pdf, other]

ROIPCA: An online memory-restricted PCA algorithm based on rank-one updates

Authors: Roy Mitz, Yoel Shkolnisky

Abstract: Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fRO… ▽ More Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fROIPCA is faster and has comparable accuracy. We show the relation between fROIPCA and an existing popular gradient algorithm for online PCA, and in particular, prove that fROIPCA is in fact a gradient algorithm with an optimal learning rate. We demonstrate numerically the advantages of our algorithms over existing state-of-the-art algorithms in terms of accuracy and runtime. △ Less

Submitted 7 June, 2023; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: 23 pages, 2 figures

arXiv:1906.00211 [pdf, ps, other]

doi 10.1093/imaiai/iaaa019

Multi-reference factor analysis: low-rank covariance estimation under unknown translations

Authors: Boris Landa, Yoel Shkolnisky

Abstract: We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying si… ▽ More We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the $\textit{power spectrum}$ and the $\textit{trispectrum}$. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r<\sqrt{L}$. Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of $r$ is required, and prove the procedure's statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise. △ Less

Submitted 21 September, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

arXiv:1905.12442 [pdf, other]

Rank-one Multi-Reference Factor Analysis

Authors: Yariv Aizenbud, Boris Landa, Yoel Shkolnisky

Abstract: In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy ob… ▽ More In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal's length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings. △ Less

Submitted 4 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1901.10888 [pdf, ps, other]

doi 10.1088/1361-6420/ab2fb2

A common lines approach for ab-initio modeling of cyclically-symmetric molecules

Authors: Gabi Pragier, Yoel Shkolnisky

Abstract: One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes… ▽ More One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes self common lines which induce identical lines within the Fourier transform of each projection-image. We show that the location of self common lines admits quite a few favorable geometrical constraints, thus allowing to detect them even in a noisy setting. In addition, for molecules with higher order rotational symmetry, our proposed method exploits the fact that there exist numerous common lines between any two Fourier transformed projection-images of such molecules, thus allowing to determine their relative orientation even under high levels of noise. The efficacy of our proposed method is demonstrated using numerical experiments conducted on simulated and experimental data. △ Less

Submitted 24 June, 2019; v1 submitted 20 January, 2019; originally announced January 2019.

MSC Class: 92-08; 62-07; 62P10

arXiv:1802.01894 [pdf, ps, other]

The steerable graph Laplacian and its application to filtering image data-sets

Authors: Boris Landa, Yoel Shkolnisky

Abstract: In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and… ▽ More In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and rotation-invariant framework for processing such datasets, we introduce a graph Laplacian (GL)-like operator over the dataset, termed ${\textit{steerable graph Laplacian}}$. Essentially, the steerable GL extends the standard GL by accounting for all (infinitely-many) planar rotations of all images. As it turns out, similarly to the standard GL, a properly normalized steerable GL converges to the Laplace-Beltrami operator on the low-dimensional manifold. However, the steerable GL admits an improved convergence rate compared to the GL, where the improved convergence behaves as if the intrinsic dimension of the underlying manifold is lower by one. Moreover, it is shown that the steerable GL admits eigenfunctions of the form of Fourier modes (along the orbits of the images' rotations) multiplied by eigenvectors of certain matrices, which can be computed efficiently by the FFT. For image datasets corrupted by noise, we employ a subset of these eigenfunctions to "filter" the dataset via a Fourier-like filtering scheme, essentially using all images and their rotations simultaneously. We demonstrate our filtering framework by de-noising simulated single-particle cryo-EM image datasets. △ Less

Submitted 7 August, 2018; v1 submitted 6 February, 2018; originally announced February 2018.

arXiv:1609.01100 [pdf, other]

A max-cut approach to heterogeneity in cryo-electron microscopy

Authors: Yariv Aizenbud, Yoel Shkolnisky

Abstract: The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some clas… ▽ More The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some classification procedure to identify homogeneous groups within the data, so that the images in each group correspond to the same underlying structure. The fundamental importance of the heterogeneity problem in cryo-electron microscopy has drawn many research efforts, and resulted in significant progress in classification algorithms for heterogeneous data sets. While these algorithms are extremely useful and effective in practice, they lack rigorous mathematical analysis and performance guarantees. In this paper, we attempt to make the first steps towards rigorous mathematical analysis of the heterogeneity problem in cryo-electron microscopy. To that end, we present an algorithm for processing heterogeneous data sets, and prove accuracy and stability bounds for it. We also suggest an extension of this algorithm that combines the classification and reconstruction steps. We demonstrate it on simulated data, and compare its performance to the state-of-the-art algorithm in RELION. △ Less

Submitted 3 October, 2019; v1 submitted 5 September, 2016; originally announced September 2016.

arXiv:1608.02702 [pdf, ps, other]

doi 10.1137/16M1085334

Steerable Principal Components for Space-Frequency Localized Images

Authors: Boris Landa, Yoel Shkolnisky

Abstract: This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prol… ▽ More This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prolate Spheroidal Wave Functions (PSWFs), where the expansion coefficients are evaluated using a specially designed numerical integration scheme. Then, the expansion coefficients are used to construct a rotationally-invariant covariance matrix which admits a block-diagonal structure, and the eigen-decomposition of its blocks provides us with the desired steerable principal components. The proposed method is shown to be faster then existing methods, while providing appropriate error bounds which guarantee its accuracy. △ Less

Submitted 9 August, 2018; v1 submitted 9 August, 2016; originally announced August 2016.

arXiv:1606.08819 [pdf, other]

Multi-View Kernel Consensus For Data Analysis

Authors: Moshe Salhov, Ofir Lindenbaum, Yariv Aizenbud, Avi Silberschatz, Yoel Shkolnisky, Amir Averbuch

Abstract: The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe… ▽ More The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides. △ Less

Submitted 29 January, 2019; v1 submitted 28 June, 2016; originally announced June 2016.

arXiv:1602.04358 [pdf, ps, other]

Machine olfaction using time scattering of sensor multiresolution graphs

Authors: Leonid Gugel, Yoel Shkolnisky, Shai Dekel

Abstract: In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to cr… ▽ More In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies. △ Less

Submitted 13 February, 2016; originally announced February 2016.

arXiv:1412.2067 [pdf, ps, other]

An algorithm for improving Non-Local Means operators via low-rank approximation

Authors: Victor May, Yosi Keller, Nir Sharon, Yoel Shkolnisky

Abstract: We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev… ▽ More We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev polynomials and is demonstrated on the application of natural images denoising. For this application, we provide a comprehensive comparison of our method with leading denoising methods. △ Less

Submitted 20 November, 2014; originally announced December 2014.

arXiv:1412.0781 [pdf, ps, other]

Fast Steerable Principal Component Analysis

Authors: Zhizhen Zhao, Yoel Shkolnisky, Amit Singer

Abstract: Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a da… ▽ More Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a dataset consisting of $n$ images of size $L \times L$ pixels, the computational complexity of our algorithm is $O(nL^3 + L^4)$, while existing algorithms take $O(nL^4)$. The new algorithm computes the expansion coefficients of the images in a Fourier-Bessel basis efficiently using the non-uniform fast Fourier transform. We compare the accuracy and efficiency of the new algorithm with traditional PCA and existing algorithms for steerable PCA. △ Less

Submitted 15 December, 2015; v1 submitted 1 December, 2014; originally announced December 2014.

Showing 1–15 of 15 results for author: Shkolnisky, Y