-
Object detection under the linear subspace model with application to cryo-EM images
Authors:
Amitay Eldar,
Keren Mor Waknin,
Samuel Davenport,
Tamir Bendory,
Armin Schwartzman,
Yoel Shkolnisky
Abstract:
Detecting multiple unknown objects in noisy data is a key problem in many scientific fields, such as electron microscopy imaging. A common model for the unknown objects is the linear subspace model, which assumes that the objects can be expanded in some known basis (such as the Fourier basis). In this paper, we develop an object detection algorithm that under the linear subspace model is asymptoti…
▽ More
Detecting multiple unknown objects in noisy data is a key problem in many scientific fields, such as electron microscopy imaging. A common model for the unknown objects is the linear subspace model, which assumes that the objects can be expanded in some known basis (such as the Fourier basis). In this paper, we develop an object detection algorithm that under the linear subspace model is asymptotically guaranteed to detect all objects, while controlling the family wise error rate or the false discovery rate. Numerical simulations show that the algorithm also controls the error rate with high power in the non-asymptotic regime, even in highly challenging regimes. We apply the proposed algorithm to experimental electron microscopy data set, and show that it outperforms existing standard software.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
G-invariant diffusion maps
Authors:
Eitan Rosen,
Xiuyuan Cheng,
Yoel Shkolnisky
Abstract:
The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant…
▽ More
The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set is images who's planar rotations are arbitrary. The G-invariant graph Laplacian, introduced in a previous work of the authors, admits eigenfunctions in the form of tensor products between the elements of the irreducible unitary representations of the group and eigenvectors of certain matrices. We employ these eigenfunctions to derive diffusion maps that intrinsically account for the group action on the data. In particular, we construct both equivariant and invariant embeddings which can be used naturally to cluster and align the data points. We demonstrate the effectiveness of our construction with simulated data.
△ Less
Submitted 25 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
The G-invariant graph Laplacian
Authors:
Eitan Rosen,
Paulina Hoyos,
Xiuyuan Cheng,
Joe Kileel,
Yoel Shkolnisky
Abstract:
Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs…
▽ More
Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of G on the data set. We deem the latter construction the ``G-invariant Graph Laplacian'' (G-GL). We show that the G-GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the G-GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).
△ Less
Submitted 28 June, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Signal enhancement for two-dimensional cryo-EM data processing
Authors:
Guy Sharon,
Yoel Shkolnisky,
Tamir Bendory
Abstract:
Different tasks in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM) require enhancing the quality of the highly noisy raw images. To this end, we develop an efficient algorithm for signal enhancement of cryo-EM images. The enhanced images can be used for a variety of downstream tasks, such as 2-D classification, removing uninformative images, constructing {ab initio…
▽ More
Different tasks in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM) require enhancing the quality of the highly noisy raw images. To this end, we develop an efficient algorithm for signal enhancement of cryo-EM images. The enhanced images can be used for a variety of downstream tasks, such as 2-D classification, removing uninformative images, constructing {ab initio} models, generating templates for particle picking, providing a quick assessment of the data set, dimensionality reduction, and symmetry detection. The algorithm includes built-in quality measures to assess its performance and alleviate the risk of model bias. We demonstrate the effectiveness of the proposed algorithm on several experimental data sets. In particular, we show that the quality of the resulting images is high enough to produce ab initio models of $\sim 10$ Åresolution. The algorithm is accompanied by a publicly available, documented and easy-to-use code.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
A common lines approach for ab-initio modeling of molecules with tetrahedral and octahedral symmetry
Authors:
Adi Shasha Geva,
Yoel Shkolnisky
Abstract:
A main task in cryo-electron microscopy single particle reconstruction is to find a three-dimensional model of a molecule given a set of its randomly oriented and positioned noisy projection-images. In this work, we propose an algorithm for ab-initio reconstruction for molecules with tetrahedral or octahedral symmetry. The algorithm exploits the multiple common lines between each pair of projectio…
▽ More
A main task in cryo-electron microscopy single particle reconstruction is to find a three-dimensional model of a molecule given a set of its randomly oriented and positioned noisy projection-images. In this work, we propose an algorithm for ab-initio reconstruction for molecules with tetrahedral or octahedral symmetry. The algorithm exploits the multiple common lines between each pair of projection-images as well as self common lines within each image. It is robust to noise in the input images as it integrates the information from all images at once. The applicability of the proposed algorithm is demonstrated using experimental cryo-electron microscopy data.
△ Less
Submitted 10 November, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Three-Dimensional Alignment of Density Maps in Cryo-Electron Microscopy
Authors:
Yael Harpaz,
Yoel Shkolnisky
Abstract:
A common task in cryo-electron microscopy (cryo-EM) data processing is to compare three-dimensional density maps of macromolecules. In this paper, we propose an algorithm for aligning three-dimensional density maps that exploits common lines between projection images of the maps. The algorithm is fully automatic and handles rotations, reflections (handedness), and translations between the maps. In…
▽ More
A common task in cryo-electron microscopy (cryo-EM) data processing is to compare three-dimensional density maps of macromolecules. In this paper, we propose an algorithm for aligning three-dimensional density maps that exploits common lines between projection images of the maps. The algorithm is fully automatic and handles rotations, reflections (handedness), and translations between the maps. In addition, the algorithm is applicable to any type of molecular symmetry without requiring any information regarding the symmetry of the maps. We evaluate our alignment algorithm on publicly available density maps, demonstrating its accuracy and efficiency. The algorithm is available at https://github.com/ShkolniskyLab/emalign.
△ Less
Submitted 24 February, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
ASOCEM: Automatic Segmentation Of Contaminations in cryo-EM
Authors:
Amitay Eldar,
Ido Amos,
Yoel Shkolnisky
Abstract:
Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to…
▽ More
Particle picking is currently a critical step in the cryo-electron microscopy single particle reconstruction pipeline. Contaminations in the acquired micrographs severely degrade the performance of particle pickers, resulting is many ``non-particles'' in the collected stack of particles. In this paper, we present ASOCEM (Automatic Segmentation Of Contaminations in cryo-EM), an automatic method to detect and segment contaminations, which requires as an input only the approximated particle size. In particular, it does not require any parameter tuning nor manual intervention. Our method is based on the observation that the statistical distribution of contaminated regions is different from that of the rest of the micrograph. This nonrestrictive assumption allows to automatically detect various types of contaminations, from the carbon edges of the supporting grid to high contrast blobs of different sizes. We demonstrate the efficiency of our algorithm using various experimental data sets containing various types of contaminations. ASOCEM is integrated as part of the KLT picker \cite{ELDAR2020107473} and is available at \url{https://github.com/ShkolniskyLab/kltpicker2}.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
A Perturbation-Based Kernel Approximation Framework
Authors:
Roy Mitz,
Yoel Shkolnisky
Abstract:
Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from…
▽ More
Kernel methods are powerful tools in various data analysis tasks. Yet, in many cases, their time and space complexity render them impractical for large datasets. Various kernel approximation methods were proposed to overcome this issue, with the most prominent method being the Nystr{ö}m method. In this paper, we derive a perturbation-based kernel approximation framework building upon results from classical perturbation theory. We provide an error analysis for this framework, and prove that in fact, it generalizes the Nystr{ö}m method and several of its variants. Furthermore, we show that our framework gives rise to new kernel approximation schemes, that can be tuned to take advantage of the structure of the approximated kernel matrix. We support our theoretical results numerically and demonstrate the advantages of our approximation framework on both synthetic and real-world data.
△ Less
Submitted 23 May, 2022; v1 submitted 7 September, 2020;
originally announced September 2020.
-
Super-resolution SAXS based on PSF engineering and sub-pixel detector translations
Authors:
Benjamin Gutman,
Michael Mrejen,
Gil Shabat,
Ram Avinery,
Yoel Shkolnisky,
Roy Beck
Abstract:
Small-angle X-ray scattering (SAXS) technique enables convenient nanoscopic characterization for various systems and conditions. Nonetheless, lab-based SAXS systems intrinsically suffer from insufficient x-ray flux and limited angular resolution. Here, we develop a two-step reconstruction methodology to enhance the angular resolution for given experimental conditions. Using minute hardware additio…
▽ More
Small-angle X-ray scattering (SAXS) technique enables convenient nanoscopic characterization for various systems and conditions. Nonetheless, lab-based SAXS systems intrinsically suffer from insufficient x-ray flux and limited angular resolution. Here, we develop a two-step reconstruction methodology to enhance the angular resolution for given experimental conditions. Using minute hardware additions, we show that translating the x-ray detector in subpixel steps and modifying the incoming beam shape results in a set of 2D scattering images which is sufficient for super-resolution SAXS reconstruction. The technique is verified experimentally to show above 25\% increase in resolution. Such advantages have a direct impact on the ability to resolve faster and finer nanoscopic structures and can be implemented in most existing SAXS apparatuses.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
KLT Picker: Particle Picking Using Data-Driven Optimal Templates
Authors:
Amitay Eldar,
Boris Landa,
Yoel Shkolnisky
Abstract:
Particle picking is currently a critical step in the cryo-EM single particle reconstruction pipeline. Despite extensive work on this problem, for many data sets it is still challenging, especially for low SNR micrographs. We present the KLT (Karhunen Loeve Transform) picker, which is fully automatic and requires as an input only the approximated particle size. In particular, it does not require an…
▽ More
Particle picking is currently a critical step in the cryo-EM single particle reconstruction pipeline. Despite extensive work on this problem, for many data sets it is still challenging, especially for low SNR micrographs. We present the KLT (Karhunen Loeve Transform) picker, which is fully automatic and requires as an input only the approximated particle size. In particular, it does not require any manual picking. Our method is designed especially to handle low SNR micrographs. It is based on learning a set of optimal templates through the use of multi-variate statistical analysis via the Karhunen Loeve Transform. We evaluate the KLT picker on publicly available data sets and present high-quality results with minimal manual effort.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
ROIPCA: An online memory-restricted PCA algorithm based on rank-one updates
Authors:
Roy Mitz,
Yoel Shkolnisky
Abstract:
Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fRO…
▽ More
Principal components analysis (PCA) is a fundamental algorithm in data analysis. Its memory-restricted online versions are useful in many modern applications, where the data are too large to fit in memory, or when data arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two online PCA algorithms that are based on rank-one updates. While ROIPCA is typically more accurate, fROIPCA is faster and has comparable accuracy. We show the relation between fROIPCA and an existing popular gradient algorithm for online PCA, and in particular, prove that fROIPCA is in fact a gradient algorithm with an optimal learning rate. We demonstrate numerically the advantages of our algorithms over existing state-of-the-art algorithms in terms of accuracy and runtime.
△ Less
Submitted 7 June, 2023; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Multi-reference factor analysis: low-rank covariance estimation under unknown translations
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying si…
▽ More
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the $\textit{power spectrum}$ and the $\textit{trispectrum}$. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r<\sqrt{L}$. Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of $r$ is required, and prove the procedure's statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise.
△ Less
Submitted 21 September, 2020; v1 submitted 1 June, 2019;
originally announced June 2019.
-
Rank-one Multi-Reference Factor Analysis
Authors:
Yariv Aizenbud,
Boris Landa,
Yoel Shkolnisky
Abstract:
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy ob…
▽ More
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal's length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings.
△ Less
Submitted 4 June, 2019; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Common lines ab-initio reconstruction of $D_2$-symmetric molecules
Authors:
Eitan Rosen,
Yoel Shkolnisky
Abstract:
Cryo-electron microscopy is a state-of-the-art method for determining high-resolution three-dimensional models of molecules, from their two-dimensional projection images taken by an electron microscope. A crucial step in this method is to determine a low-resolution model of the molecule using only the given projection images, without using any three-dimensional information, such as an assumed refe…
▽ More
Cryo-electron microscopy is a state-of-the-art method for determining high-resolution three-dimensional models of molecules, from their two-dimensional projection images taken by an electron microscope. A crucial step in this method is to determine a low-resolution model of the molecule using only the given projection images, without using any three-dimensional information, such as an assumed reference model. For molecules without symmetry, this is often done by exploiting common lines between pairs of images. Common lines algorithms have been recently devised for molecules with cyclic symmetry, but no such algorithms exist for molecules with dihedral symmetry. In this work, we present a common lines algorithm for determining the structure of molecules with $D_{2}$ symmetry. The algorithm exploits the common lines between all pairs of images simultaneously, as well as common lines within each image. We demonstrate the applicability of our algorithm using experimental cryo-electron microscopy data.
△ Less
Submitted 21 March, 2019;
originally announced April 2019.
-
A common lines approach for ab-initio modeling of cyclically-symmetric molecules
Authors:
Gabi Pragier,
Yoel Shkolnisky
Abstract:
One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes…
▽ More
One of the challenges in single particle reconstruction in cryo-electron microscopy is to find a three-dimensional model of a molecule using its two-dimensional noisy projection-images. In this paper, we propose a robust "angular reconstitution" algorithm for molecules with $n$-fold cyclic symmetry, that estimates the orientation parameters of the projections-images. Our suggested method utilizes self common lines which induce identical lines within the Fourier transform of each projection-image. We show that the location of self common lines admits quite a few favorable geometrical constraints, thus allowing to detect them even in a noisy setting. In addition, for molecules with higher order rotational symmetry, our proposed method exploits the fact that there exist numerous common lines between any two Fourier transformed projection-images of such molecules, thus allowing to determine their relative orientation even under high levels of noise. The efficacy of our proposed method is demonstrated using numerical experiments conducted on simulated and experimental data.
△ Less
Submitted 24 June, 2019; v1 submitted 20 January, 2019;
originally announced January 2019.
-
Sampling and Approximation of Bandlimited Volumetric Data
Authors:
Rami Katz,
Yoel Shkolnisky
Abstract:
We present an approximation scheme for functions in three dimensions, that requires only their samples on the Cartesian grid, under the assumption that the functions are sufficiently concentrated in both space and frequency. The scheme is based on expanding the given function in the basis of generalized prolate spheroidal wavefunctions, with the expansion coefficients given by weighted dot product…
▽ More
We present an approximation scheme for functions in three dimensions, that requires only their samples on the Cartesian grid, under the assumption that the functions are sufficiently concentrated in both space and frequency. The scheme is based on expanding the given function in the basis of generalized prolate spheroidal wavefunctions, with the expansion coefficients given by weighted dot products between the samples of the function and the samples of the basis functions. As numerical implementations require all expansions to be finite, we present a truncation rule for the expansions. Finally, we derive a bound on the overall approximation error in terms of the assumed space/frequency concentration.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Common lines modeling for reference free ab-initio reconstruction in cryo-EM
Authors:
Ido Greenberg,
Yoel Shkolnisky
Abstract:
We consider the problem of estimating an unbiased and reference-free ab-inito model for non-symmetric molecules from images generated by single-particle cryo-electron microscopy. The proposed algorithm finds the globally optimal assignment of orientations that simultaneously respects all common lines between all images. The contribution of each common line to the estimated orientations is weighted…
▽ More
We consider the problem of estimating an unbiased and reference-free ab-inito model for non-symmetric molecules from images generated by single-particle cryo-electron microscopy. The proposed algorithm finds the globally optimal assignment of orientations that simultaneously respects all common lines between all images. The contribution of each common line to the estimated orientations is weighted according to a statistical model for common lines' detection errors. The key property of the proposed algorithm is that it finds the global optimum for the orientations given the common lines. In particular, any local optima in the common lines energy landscape do not affect the proposed algorithm. As a result, it is applicable to thousands of images at once, very robust to noise, completely reference free, and not biased towards any initial model. A byproduct of the algorithm is a set of measures that allow to asses the reliability of the obtained ab-initio model. We demonstrate the algorithm using class averages from two experimental data sets, resulting in ab-initio models with resolutions of 20A or better, even from class averages consisting of as few as three raw images per class.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
The steerable graph Laplacian and its application to filtering image data-sets
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and…
▽ More
In recent years, improvements in various image acquisition techniques gave rise to the need for adaptive processing methods, aimed particularly for large datasets corrupted by noise and deformations. In this work, we consider datasets of images sampled from a low-dimensional manifold (i.e. an image-valued manifold), where the images can assume arbitrary planar rotations. To derive an adaptive and rotation-invariant framework for processing such datasets, we introduce a graph Laplacian (GL)-like operator over the dataset, termed ${\textit{steerable graph Laplacian}}$. Essentially, the steerable GL extends the standard GL by accounting for all (infinitely-many) planar rotations of all images. As it turns out, similarly to the standard GL, a properly normalized steerable GL converges to the Laplace-Beltrami operator on the low-dimensional manifold. However, the steerable GL admits an improved convergence rate compared to the GL, where the improved convergence behaves as if the intrinsic dimension of the underlying manifold is lower by one. Moreover, it is shown that the steerable GL admits eigenfunctions of the form of Fourier modes (along the orbits of the images' rotations) multiplied by eigenvectors of certain matrices, which can be computed efficiently by the FFT. For image datasets corrupted by noise, we employ a subset of these eigenfunctions to "filter" the dataset via a Fourier-like filtering scheme, essentially using all images and their rotations simultaneously. We demonstrate our filtering framework by de-noising simulated single-particle cryo-EM image datasets.
△ Less
Submitted 7 August, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
Symmetric rank-one updates from partial spectrum with an application to out-of-sample extension
Authors:
Roy Mitz,
Nir Sharon,
Yoel Shkolnisky
Abstract:
Rank-one update of the spectrum of a matrix is a fundamental problem in classical perturbation theory. In this paper, we consider its variant where only part of the spectrum is known. We address this variant using an efficient scheme for updating the known eigenpairs with guaranteed error bounds. Then, we apply our scheme to the extension of the top eigenvectors of the graph Laplacian to a new dat…
▽ More
Rank-one update of the spectrum of a matrix is a fundamental problem in classical perturbation theory. In this paper, we consider its variant where only part of the spectrum is known. We address this variant using an efficient scheme for updating the known eigenpairs with guaranteed error bounds. Then, we apply our scheme to the extension of the top eigenvectors of the graph Laplacian to a new data sample. In particular, we model this extension as a perturbation problem and show how to solve it using our rank-one updating scheme. We provide a theoretical analysis of this extension method, and back it up with numerical results that illustrate its advantages.
△ Less
Submitted 8 July, 2019; v1 submitted 7 October, 2017;
originally announced October 2017.
-
A max-cut approach to heterogeneity in cryo-electron microscopy
Authors:
Yariv Aizenbud,
Yoel Shkolnisky
Abstract:
The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some clas…
▽ More
The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some classification procedure to identify homogeneous groups within the data, so that the images in each group correspond to the same underlying structure. The fundamental importance of the heterogeneity problem in cryo-electron microscopy has drawn many research efforts, and resulted in significant progress in classification algorithms for heterogeneous data sets. While these algorithms are extremely useful and effective in practice, they lack rigorous mathematical analysis and performance guarantees.
In this paper, we attempt to make the first steps towards rigorous mathematical analysis of the heterogeneity problem in cryo-electron microscopy. To that end, we present an algorithm for processing heterogeneous data sets, and prove accuracy and stability bounds for it. We also suggest an extension of this algorithm that combines the classification and reconstruction steps. We demonstrate it on simulated data, and compare its performance to the state-of-the-art algorithm in RELION.
△ Less
Submitted 3 October, 2019; v1 submitted 5 September, 2016;
originally announced September 2016.
-
Steerable Principal Components for Space-Frequency Localized Images
Authors:
Boris Landa,
Yoel Shkolnisky
Abstract:
This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prol…
▽ More
This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prolate Spheroidal Wave Functions (PSWFs), where the expansion coefficients are evaluated using a specially designed numerical integration scheme. Then, the expansion coefficients are used to construct a rotationally-invariant covariance matrix which admits a block-diagonal structure, and the eigen-decomposition of its blocks provides us with the desired steerable principal components. The proposed method is shown to be faster then existing methods, while providing appropriate error bounds which guarantee its accuracy.
△ Less
Submitted 9 August, 2018; v1 submitted 9 August, 2016;
originally announced August 2016.
-
Multi-View Kernel Consensus For Data Analysis
Authors:
Moshe Salhov,
Ofir Lindenbaum,
Yariv Aizenbud,
Avi Silberschatz,
Yoel Shkolnisky,
Amir Averbuch
Abstract:
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe…
▽ More
The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides.
△ Less
Submitted 29 January, 2019; v1 submitted 28 June, 2016;
originally announced June 2016.
-
Machine olfaction using time scattering of sensor multiresolution graphs
Authors:
Leonid Gugel,
Yoel Shkolnisky,
Shai Dekel
Abstract:
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to cr…
▽ More
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies.
△ Less
Submitted 13 February, 2016;
originally announced February 2016.
-
Direct Inversion of the 3D Pseudo-polar Fourier Transform
Authors:
Amir Averbuch,
Gil Shabat,
Yoel Shkolnisky
Abstract:
The pseudo-polar Fourier transform is a specialized non-equally spaced Fourier transform, which evaluates the Fourier transform on a near-polar grid, known as the pseudo-polar grid. The advantage of the pseudo-polar grid over other non-uniform sampling geometries is that the transformation, which samples the Fourier transform on the pseudo-polar grid, can be inverted using a fast and stable algori…
▽ More
The pseudo-polar Fourier transform is a specialized non-equally spaced Fourier transform, which evaluates the Fourier transform on a near-polar grid, known as the pseudo-polar grid. The advantage of the pseudo-polar grid over other non-uniform sampling geometries is that the transformation, which samples the Fourier transform on the pseudo-polar grid, can be inverted using a fast and stable algorithm. For other sampling geometries, even if the non-equally spaced Fourier transform can be inverted, the only known algorithms are iterative. The convergence speed of these algorithms as well as their accuracy are difficult to control, as they depend both on the sampling geometry as well as on the unknown reconstructed object. In this paper, we present a direct inversion algorithm for the three-dimensional pseudo-polar Fourier transform. The algorithm is based only on one-dimensional resampling operations, and is shown to be significantly faster than existing iterative inversion algorithms.
△ Less
Submitted 6 February, 2016; v1 submitted 22 July, 2015;
originally announced July 2015.
-
Evaluating Non-Analytic Functions of Matrices
Authors:
Nir Sharon,
Yoel Shkolnisky
Abstract:
The paper revisits the classical problem of evaluating $f(A)$ for a real function $f$ and a matrix $A$ with real spectrum. The evaluation is based on expanding $f$ in Chebyshev polynomials, and the focus of the paper is to study the convergence rates of these expansions. In particular, we derive bounds on the convergence rates which reveal the relation between the smoothness of $f$ and the diagona…
▽ More
The paper revisits the classical problem of evaluating $f(A)$ for a real function $f$ and a matrix $A$ with real spectrum. The evaluation is based on expanding $f$ in Chebyshev polynomials, and the focus of the paper is to study the convergence rates of these expansions. In particular, we derive bounds on the convergence rates which reveal the relation between the smoothness of $f$ and the diagonalizability of the matrix A. We present several numerical examples to illustrate our analysis.
△ Less
Submitted 22 December, 2018; v1 submitted 14 July, 2015;
originally announced July 2015.
-
An algorithm for improving Non-Local Means operators via low-rank approximation
Authors:
Victor May,
Yosi Keller,
Nir Sharon,
Yoel Shkolnisky
Abstract:
We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev…
▽ More
We present a method for improving a Non Local Means operator by computing its low-rank approximation. The low-rank operator is constructed by applying a filter to the spectrum of the original Non Local Means operator. This results in an operator which is less sensitive to noise while preserving important properties of the original operator. The method is efficiently implemented based on Chebyshev polynomials and is demonstrated on the application of natural images denoising. For this application, we provide a comprehensive comparison of our method with leading denoising methods.
△ Less
Submitted 20 November, 2014;
originally announced December 2014.
-
Fast Steerable Principal Component Analysis
Authors:
Zhizhen Zhao,
Yoel Shkolnisky,
Amit Singer
Abstract:
Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a da…
▽ More
Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2D images as large as a few hundred pixels in each direction. Here we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of two-dimensional images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a dataset consisting of $n$ images of size $L \times L$ pixels, the computational complexity of our algorithm is $O(nL^3 + L^4)$, while existing algorithms take $O(nL^4)$. The new algorithm computes the expansion coefficients of the images in a Fourier-Bessel basis efficiently using the non-uniform fast Fourier transform. We compare the accuracy and efficiency of the new algorithm with traditional PCA and existing algorithms for steerable PCA.
△ Less
Submitted 15 December, 2015; v1 submitted 1 December, 2014;
originally announced December 2014.
-
A Fourier-based Approach for Iterative 3D Reconstruction from Cryo-EM Images
Authors:
Lanhui Wang,
Yoel Shkolnisky,
Amit Singer
Abstract:
A major challenge in single particle reconstruction methods using cryo-electron microscopy is to attain a resolution sufficient to interpret fine details in three-dimensional (3D) macromolecular structures. Obtaining high resolution 3D reconstructions is difficult due to unknown orientations and positions of the imaged particles, possible incomplete coverage of the viewing directions, high level o…
▽ More
A major challenge in single particle reconstruction methods using cryo-electron microscopy is to attain a resolution sufficient to interpret fine details in three-dimensional (3D) macromolecular structures. Obtaining high resolution 3D reconstructions is difficult due to unknown orientations and positions of the imaged particles, possible incomplete coverage of the viewing directions, high level of noise in the projection images, and limiting effects of the contrast transfer function of the electron microscope. In this paper, we focus on the 3D reconstruction problem from projection images assuming an existing estimate for their orientations and positions. We propose a fast and accurate Fourier-based Iterative Reconstruction Method (FIRM) that exploits the Toeplitz structure of the operator ${\bf A}^{*}{\bf A}$, where $\bf A$ is the forward projector and ${\bf A}^{*}$ is the back projector. The operator ${\bf A}^{*}{\bf A}$ is equivalent to a convolution with a kernel. The kernel is pre-computed using the non-uniform Fast Fourier Transform and is efficiently applied in each iteration step. The iterations by FIRM are therefore considerably faster than those of traditional iterative algebraic approaches, while maintaining the same accuracy even when the viewing directions are unevenly distributed. The time complexity of FIRM is comparable to the direct Fourier inversion method. Moreover, FIRM combines images from different defocus groups simultaneously and can handle a wide range of regularization terms. We provide experimental results on simulated data that demonstrate the speed and accuracy of FIRM in comparison with current methods.
△ Less
Submitted 22 July, 2013;
originally announced July 2013.
-
An algorithm for the principal component analysis of large data sets
Authors:
Nathan Halko,
Per-Gunnar Martinsson,
Yoel Shkolnisky,
Mark Tygert
Abstract:
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure…
▽ More
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.
△ Less
Submitted 19 March, 2011; v1 submitted 30 July, 2010;
originally announced July 2010.