Search | arXiv e-print repository

doi 10.1007/s10543-024-01024-x

Stochastic Iterative Methods for Online Rank Aggregation from Pairwise Comparisons

Authors: Benjamin Jarman, Lara Kassab, Deanna Needell, Alexander Sietsema

Abstract: In this paper, we consider large-scale ranking problems where one is given a set of (possibly non-redundant) pairwise comparisons and the underlying ranking explained by those comparisons is desired. We show that stochastic gradient descent approaches can be leveraged to offer convergence to a solution that reveals the underlying ranking while requiring low-memory operations. We introduce several… ▽ More In this paper, we consider large-scale ranking problems where one is given a set of (possibly non-redundant) pairwise comparisons and the underlying ranking explained by those comparisons is desired. We show that stochastic gradient descent approaches can be leveraged to offer convergence to a solution that reveals the underlying ranking while requiring low-memory operations. We introduce several variations of this approach that offer a tradeoff in speed and convergence when the pairwise comparisons are noisy (i.e., some comparisons do not respect the underlying ranking). We prove theoretical results for convergence almost surely and study several regimes including those with full observations, partial observations, and noisy observations. Our empirical results give insights into the number of observations required as well as how much noise in those measurements can be tolerated. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Journal ref: Bit Numer Math 64, 26 (2024)

arXiv:2406.12021 [pdf, other]

Block Matrix and Tensor Randomized Kaczmarz Methods for Linear Feasibility Problems

Authors: Minxin Zhang, Jamie Haddock, Deanna Needell

Abstract: The randomized Kaczmarz methods are a popular and effective family of iterative methods for solving large-scale linear systems of equations, which have also been applied to linear feasibility problems. In this work, we propose a new block variant of the randomized Kaczmarz method, B-MRK, for solving linear feasibility problems defined by matrices. We show that B-MRK converges linearly in expectati… ▽ More The randomized Kaczmarz methods are a popular and effective family of iterative methods for solving large-scale linear systems of equations, which have also been applied to linear feasibility problems. In this work, we propose a new block variant of the randomized Kaczmarz method, B-MRK, for solving linear feasibility problems defined by matrices. We show that B-MRK converges linearly in expectation to the feasible region.Furthermore, we extend the method to solve tensor linear feasibility problems defined under the tensor t-product. A tensor randomized Kaczmarz (TRK) method, TRK-L, is proposed for solving linear feasibility problems that involve mixed equality and inequality constraints. Additionally, we introduce another TRK method, TRK-LB, specifically tailored for cases where the feasible region is defined by linear equality constraints coupled with bound constraints on the variables. We show that both of the TRK methods converge linearly in expectation to the feasible region. Moreover, the effectiveness of our methods is demonstrated through numerical experiments on various Gaussian random data and applications in image deblurring. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.05818 [pdf, ps, other]

Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems

Authors: Michał Dereziński, Daniel LeJeune, Deanna Needell, Elizaveta Rebrova

Abstract: While effective in practice, iterative methods for solving large systems of linear equations can be significantly affected by problem-dependent condition number quantities. This makes characterizing their time complexity challenging, particularly when we wish to make comparisons between deterministic and stochastic methods, that may or may not rely on preconditioning and/or fast matrix multiplicat… ▽ More While effective in practice, iterative methods for solving large systems of linear equations can be significantly affected by problem-dependent condition number quantities. This makes characterizing their time complexity challenging, particularly when we wish to make comparisons between deterministic and stochastic methods, that may or may not rely on preconditioning and/or fast matrix multiplication. In this work, we consider a fine-grained notion of complexity for iterative linear solvers which we call the spectral tail condition number, $κ_\ell$, defined as the ratio between the $\ell$th largest and the smallest singular value of the matrix representing the system. Concretely, we prove the following main algorithmic result: Given an $n\times n$ matrix $A$ and a vector $b$, we can find $\tilde{x}$ such that $\|A\tilde{x}-b\|\leqε\|b\|$ in time $\tilde{O}(κ_\ell\cdot n^2\log 1/ε)$ for any $\ell = O(n^{\frac1{ω-1}})=O(n^{0.729})$, where $ω\approx 2.372$ is the current fast matrix multiplication exponent. This guarantee is achieved by Sketch-and-Project with Nesterov's acceleration. Some of the implications of our result, and of the use of $κ_\ell$, include direct improvement over a fine-grained analysis of the Conjugate Gradient method, suggesting a stronger separation between deterministic and stochastic iterative solvers; and relating the complexity of iterative solvers to the ongoing algorithmic advances in fast matrix multiplication, since the bound on $\ell$ improves with $ω$. Our main technical contributions are new sharp characterizations for the first and second moments of the random projection matrix that commonly arises in sketching algorithms, building on a combination of techniques from combinatorial sampling via determinantal point processes and Gaussian universality results from random matrix theory. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 32 pages

arXiv:2405.03073 [pdf, other]

Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms

Authors: Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Abstract: We analyze inexact Riemannian gradient descent (RGD) where Riemannian gradients and retractions are inexactly (and cheaply) computed. Our focus is on understanding when inexact RGD converges and what is the complexity in the general nonconvex and constrained setting. We answer these questions in a general framework of tangential Block Majorization-Minimization (tBMM). We establish that tBMM conver… ▽ More We analyze inexact Riemannian gradient descent (RGD) where Riemannian gradients and retractions are inexactly (and cheaply) computed. Our focus is on understanding when inexact RGD converges and what is the complexity in the general nonconvex and constrained setting. We answer these questions in a general framework of tangential Block Majorization-Minimization (tBMM). We establish that tBMM converges to an $ε$-stationary point within $O(ε^{-2})$ iterations. Under a mild assumption, the results still hold when the subproblem is solved inexactly in each iteration provided the total optimality gap is bounded. Our general analysis applies to a wide range of classical algorithms with Riemannian constraints including inexact RGD and proximal gradient method on Stiefel manifolds. We numerically validate that tBMM shows improved performance over existing methods when applied to various problems, including nonnegative tensor decomposition with Riemannian constraints, regularized nonnegative matrix factorization, and low-rank matrix recovery problems. △ Less

Submitted 9 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 23 pages, 5 figures. ICML 2024. Appendix revised

arXiv:2403.14688 [pdf, other]

Kernel Alignment for Unsupervised Feature Selection via Matrix Factorization

Authors: Ziyuan Lin, Deanna Needell

Abstract: By removing irrelevant and redundant features, feature selection aims to find a good representation of the original features. With the prevalence of unlabeled data, unsupervised feature selection has been proven effective in alleviating the so-called curse of dimensionality. Most existing matrix factorization-based unsupervised feature selection methods are built upon subspace learning, but they h… ▽ More By removing irrelevant and redundant features, feature selection aims to find a good representation of the original features. With the prevalence of unlabeled data, unsupervised feature selection has been proven effective in alleviating the so-called curse of dimensionality. Most existing matrix factorization-based unsupervised feature selection methods are built upon subspace learning, but they have limitations in capturing nonlinear structural information among features. It is well-known that kernel techniques can capture nonlinear structural information. In this paper, we construct a model by integrating kernel functions and kernel alignment, which can be equivalently characterized as a matrix factorization problem. However, such an extension raises another issue: the algorithm performance heavily depends on the choice of kernel, which is often unknown a priori. Therefore, we further propose a multiple kernel-based learning method. By doing so, our model can learn both linear and nonlinear similarity information and automatically generate the most appropriate kernel. Experimental analysis on real-world data demonstrates that the two proposed methods outperform other classic and state-of-the-art unsupervised feature selection methods in terms of clustering results and redundancy reduction in almost all datasets tested. △ Less

Submitted 13 March, 2024; originally announced March 2024.

MSC Class: 65F10; 65F22; 90C26

arXiv:2403.06903 [pdf, ps, other]

Benign overfitting in leaky ReLU networks with moderate input dimension

Authors: Kedar Karhadkar, Erin George, Michael Murray, Guido Montúfar, Deanna Needell

Abstract: The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogon… ▽ More The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with Gradient Descent (GD) satisfy this property. In contrast to prior work we do not require near orthogonality conditions on the training data: notably, for input dimension $d$ and training sample size $n$, while prior work shows asymptotically optimal error when $d = Ω(n^2 \log n)$, here we require only $d = Ω\left(n \log \frac{1}ε\right)$ to obtain error within $ε$ of optimal. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 36 pages

arXiv:2403.01204 [pdf, ps, other]

Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Authors: Halyun Jeong, Deanna Needell, Elizaveta Rebrova

Abstract: We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is t… ▽ More We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Submitted to a journal

MSC Class: 65F10; 60-XX

arXiv:2402.14224 [pdf, other]

Framing in the Presence of Supporting Data: A Case Study in U.S. Economic News

Authors: Alexandria Leto, Elliot Pickens, Coen D. Needell, David Rothschild, Maria Leonor Pacheco

Abstract: The mainstream media has much leeway in what it chooses to cover and how it covers it. These choices have real-world consequences on what people know and their subsequent behaviors. However, the lack of objective measures to evaluate editorial choices makes research in this area particularly difficult. In this paper, we argue that there are newsworthy topics where objective measures exist in the f… ▽ More The mainstream media has much leeway in what it chooses to cover and how it covers it. These choices have real-world consequences on what people know and their subsequent behaviors. However, the lack of objective measures to evaluate editorial choices makes research in this area particularly difficult. In this paper, we argue that there are newsworthy topics where objective measures exist in the form of supporting data and propose a computational framework to analyze editorial choices in this setup. We focus on the economy because the reporting of economic indicators presents us with a relatively easy way to determine both the selection and framing of various publications. Their values provide a ground truth of how the economy is doing relative to how the publications choose to cover it. To do this, we define frame prediction as a set of interdependent tasks. At the article level, we learn to identify the reported stance towards the general state of the economy. Then, for every numerical quantity reported in the article, we learn to identify whether it corresponds to an economic indicator and whether it is being reported in a positive or negative way. To perform our analysis, we track six American publishers and each article that appeared in the top 10 slots of their landing page between 2015 and 2023. △ Less

Submitted 22 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: total pages: 19; main body pages: 8; total figures: 19

arXiv:2312.10330 [pdf, other]

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Authors: Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Abstract: Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riema… ▽ More Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $ε$-stationary point within $\widetilde{O}(ε^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 54 pages, 8 figures

arXiv:2311.10789 [pdf, other]

Stratified-NMF for Heterogeneous Data

Authors: James Chapman, Yotam Yaniv, Deanna Needell

Abstract: Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent st… ▽ More Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 5 pages. Will appear in IEEE Asilomar Conference on Signals, Systems, and Computers 2023

ACM Class: G.1.6; I.5.3; I.5.4

arXiv:2308.13709 [pdf, other]

Fast and Low-Memory Compressive Sensing Algorithms for Low Tucker-Rank Tensor Approximation from Streamed Measurements

Authors: Cullen Haselby, Mark A. Iwen, Deanna Needell, Elizaveta Rebrova, William Swartworth

Abstract: In this paper we consider the problem of recovering a low-rank Tucker approximation to a massive tensor based solely on structured random compressive measurements. Crucially, the proposed random measurement ensembles are both designed to be compactly represented (i.e., low-memory), and can also be efficiently computed in one-pass over the tensor. Thus, the proposed compressive sensing approach may… ▽ More In this paper we consider the problem of recovering a low-rank Tucker approximation to a massive tensor based solely on structured random compressive measurements. Crucially, the proposed random measurement ensembles are both designed to be compactly represented (i.e., low-memory), and can also be efficiently computed in one-pass over the tensor. Thus, the proposed compressive sensing approach may be used to produce a low-rank factorization of a huge tensor that is too large to store in memory with a total memory footprint on the order of the much smaller desired low-rank factorization. In addition, the compressive sensing recovery algorithm itself (which takes the compressive measurements as input, and then outputs a low-rank factorization) also runs in a time which principally depends only on the size of the sought factorization, making its runtime sub-linear in the size of the large tensor one is approximating. Finally, unlike prior works related to (streaming) algorithms for low-rank tensor approximation from such compressive measurements, we present a unified analysis of both Kronecker and Khatri-Rao structured measurement ensembles culminating in error guarantees comparing the error of our recovery algorithm's approximation of the input tensor to the best possible low-rank Tucker approximation error achievable for the tensor by any possible algorithm. We further include an empirical study of the proposed approach that verifies our theoretical findings and explores various trade-offs of parameters of interest. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 59 pages, 8 figures

MSC Class: 65F55

arXiv:2308.00695 [pdf, ps, other]

Harnessing the Power of Sample Abundance: Theoretical Guarantees and Algorithms for Accelerated One-Bit Sensing

Authors: Arian Eamaz, Farhang Yeganegi, Deanna Needell, Mojtaba Soltanalian

Abstract: One-bit quantization with time-varying sampling thresholds (also known as random dithering) has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as… ▽ More One-bit quantization with time-varying sampling thresholds (also known as random dithering) has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what one may refer to as sample abundance. We show that sample abundance plays a pivotal role in many signal recovery and optimization problems that are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints. Of particular interest to our work are low-rank matrix recovery and compressed sensing applications that take advantage of one-bit quantization. We demonstrate that the sample abundance paradigm allows for the transformation of such problems to merely linear feasibility problems by forming large-scale overdetermined linear systems -- thus removing the need for handling costly optimization constraints and objectives. To make the proposed computational cost savings achievable, we offer enhanced randomized Kaczmarz algorithms to solve these highly overdetermined feasibility problems and provide theoretical guarantees in terms of their convergence, sample size requirements, and overall performance. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies. △ Less

Submitted 10 January, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.03467

arXiv:2307.04056 [pdf, other]

Manifold Filter-Combine Networks

Authors: Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter

Abstract: We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then co… ▽ More We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions. △ Less

Submitted 5 September, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.09955 [pdf, other]

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

Authors: Erin George, Michael Murray, William Swartworth, Deanna Needell

Abstract: We study benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted or flipped. We identify conditions on the margin of the clean data that give rise to three distinct training outcomes: benign overfitting, in w… ▽ More We study benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted or flipped. We identify conditions on the margin of the clean data that give rise to three distinct training outcomes: benign overfitting, in which zero loss is achieved and with high probability test data is classified correctly; overfitting, in which zero loss is achieved but test data is misclassified with probability lower bounded by a constant; and non-overfitting, in which clean points, but not corrupt points, achieve zero loss and again with high probability test data is classified correctly. Our analysis provides a fine-grained description of the dynamics of neurons throughout training and reveals two distinct phases: in the first phase clean points achieve close to zero loss, in the second phase clean points oscillate on the boundary of zero loss while corrupt points either converge towards zero loss or are eventually zeroed by the network. We prove these results using a combinatorial approach that involves bounding the number of clean versus corrupt updates across these phases of training. △ Less

Submitted 8 November, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 48 pages, 2 figures, 1 table

arXiv:2306.04730 [pdf, other]

Stochastic Natural Thresholding Algorithms

Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, **g Qin

Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00507 [pdf, other]

Curvature corrected tangent space-based approximation of manifold-valued data

Authors: Willem Diepeveen, Joyce Chew, Deanna Needell

Abstract: When generalizing schemes for real-valued data approximation or decomposition to data living in Riemannian manifolds, tangent space-based schemes are very attractive for the simple reason that these spaces are linear. An open challenge is to do this in such a way that the generalized scheme is applicable to general Riemannian manifolds, is global-geometry aware and is computationally feasible. Exi… ▽ More When generalizing schemes for real-valued data approximation or decomposition to data living in Riemannian manifolds, tangent space-based schemes are very attractive for the simple reason that these spaces are linear. An open challenge is to do this in such a way that the generalized scheme is applicable to general Riemannian manifolds, is global-geometry aware and is computationally feasible. Existing schemes have been unable to account for all three of these key factors at the same time. In this work, we take a systematic approach to develo** a framework that is able to account for all three factors. First, we will restrict ourselves to the -- still general -- class of symmetric Riemannian manifolds and show how curvature affects general manifold-valued tensor approximation schemes. Next, we show how the latter observations can be used in a general strategy for develo** approximation schemes that are also global-geometry aware. Finally, having general applicability and global-geometry awareness taken into account we restrict ourselves once more in a case study on low-rank approximation. Here we show how computational feasibility can be achieved and propose the curvature-corrected truncated higher-order singular value decomposition (CC-tHOSVD), whose performance is subsequently tested in numerical experiments with both synthetic and real data living in symmetric Riemannian manifolds with both positive and negative curvature. △ Less

Submitted 1 June, 2023; originally announced June 2023.

MSC Class: 53Z50; 15A69; 90C26; 90C30; 53-04; 53-08; 49Q99

arXiv:2305.14574 [pdf, ps, other]

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Authors: Erin George, Joyce Chew, Deanna Needell

Abstract: Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. These biases manifest not only between a word and an explicit marker of its stereotype, but also between words that share related stereotypes. This latter phenomenon, sometimes called "indirect bias,'' has resisted prior attempts at debiasing. In this paper, we propose a n… ▽ More Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. These biases manifest not only between a word and an explicit marker of its stereotype, but also between words that share related stereotypes. This latter phenomenon, sometimes called "indirect bias,'' has resisted prior attempts at debiasing. In this paper, we propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings by modifying biased relationships between words before embeddings are learned. This is done by considering how the co-occurrence probability of a given pair of words changes in the presence of words marking an attribute of bias, and using this to average out the effect of a bias attribute. To evaluate this method, we perform a series of common tests and demonstrate that measures of bias in the word embeddings are reduced in exchange for minor reduction in the semantic quality of the embeddings. In addition, we conduct novel tests for measuring indirect stereotypes by extending the Word Embedding Association Test (WEAT) with new test sets for indirect binary gender stereotypes. With these tests, we demonstrate the presence of more subtle stereotypes not addressed by previous work. The proposed method is able to reduce the presence of some of these new stereotypes, serving as a crucial next step towards non-stereotyped word embeddings. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 15 pages

arXiv:2305.04080 [pdf, other]

doi 10.1137/23M1574282

Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruption

Authors: HanQin Cai, Zehan Chao, Longxiu Huang, Deanna Needell

Abstract: We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker… ▽ More We study the tensor robust principal component analysis (TRPCA) problem, a tensorial extension of matrix robust principal component analysis (RPCA), that aims to split the given tensor into an underlying low-rank component and a sparse outlier component. This work proposes a fast algorithm, called Robust Tensor CUR Decompositions (RTCUR), for large-scale non-convex TRPCA problems under the Tucker rank setting. RTCUR is developed within a framework of alternating projections that projects between the set of low-rank tensors and the set of sparse tensors. We utilize the recently developed tensor CUR decomposition to substantially reduce the computational complexity in each projection. In addition, we develop four variants of RTCUR for different application settings. We demonstrate the effectiveness and computational advantages of RTCUR against state-of-the-art methods on both synthetic and real-world datasets. △ Less

Submitted 10 October, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

MSC Class: 68p20; 68W20; 68W25; 68Q25; 65F30

Journal ref: SIAM Journal on Imaging Sciences 17 (1), 225-247, 2024

arXiv:2304.10123 [pdf, other]

Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

Authors: Halyun Jeong, Deanna Needell

Abstract: The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure… ▽ More The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure, and optimization with nonconvex constraints. Recently, a hybrid method called Kaczmarz-based IHT (KZIHT) has been proposed, combining the benefits of both approaches, but its theoretical guarantees are missing. In this paper, we provide the first theoretical convergence guarantees for KZIHT by showing that it converges linearly to the solution of a system with sparsity constraints up to optimal statistical bias when the reshuffling data sampling scheme is used. We also propose the Kaczmarz with periodic thresholding (KZPT) method, which generalizes KZIHT by applying the thresholding operation for every certain number of KZ iterations and by employing two different types of step sizes. We establish a linear convergence guarantee for KZPT for randomly subsampled bounded orthonormal systems (BOS) and mean-zero isotropic sub-Gaussian random matrices, which are most commonly used models in compressed sensing, dimension reduction, matrix sketching, and many inverse problems in neural networks. Our analysis shows that KZPT with an optimal thresholding period outperforms KZIHT. To support our theory, we include several numerical experiments. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Submitted to a journal

MSC Class: 65F10; 65F22; 90C26

arXiv:2304.04860 [pdf, other]

Iterative Singular Tube Hard Thresholding Algorithms for Tensor Recovery

Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, **g Qin

Abstract: Due to the explosive growth of large-scale data sets, tensors have been a vital tool to analyze and process high-dimensional data. Different from the matrix case, tensor decomposition has been defined in various formats, which can be further used to define the best low-rank approximation of a tensor to significantly reduce the dimensionality for signal compression and recovery. In this paper, we c… ▽ More Due to the explosive growth of large-scale data sets, tensors have been a vital tool to analyze and process high-dimensional data. Different from the matrix case, tensor decomposition has been defined in various formats, which can be further used to define the best low-rank approximation of a tensor to significantly reduce the dimensionality for signal compression and recovery. In this paper, we consider the low-rank tensor recovery problem when the tubal rank of the underlying tensor is given or estimated a priori. We propose a novel class of iterative singular tube hard thresholding algorithms for tensor recovery based on the low-tubal-rank tensor approximation, including basic, accelerated deterministic and stochastic versions. Convergence guarantees are provided along with the special case when the measurements are linear. Numerical experiments on tensor compressive sensing and color image inpainting are conducted to demonstrate convergence and computational efficiency in practice. △ Less

Submitted 26 December, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.09594 [pdf, ps, other]

One-Bit Quadratic Compressed Sensing: From Sample Abundance to Linear Feasibility

Authors: Arian Eamaz, Farhang Yeganegi, Deanna Needell, Mojtaba Soltanalian

Abstract: One-bit quantization with time-varying sampling thresholds has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional m… ▽ More One-bit quantization with time-varying sampling thresholds has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what we refer to as sample abundance. On the other hand, many signal recovery and optimization problems are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints in the one-bit sampling regime. We demonstrate, with a particular focus on quadratic compressed sensing, that the sample abundance paradigm allows for the transformation of such quadratic problems to merely a linear feasibility problem by forming a large-scale overdetermined linear system; thus removing the need for costly optimization constraints and objectives. To efficiently tackle the emerging overdetermined linear feasibility problem, we further propose an enhanced randomized Kaczmarz algorithm, called Block SKM. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2301.03467

arXiv:2303.00058 [pdf, other]

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

Authors: Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell

Abstract: We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We deriv… ▽ More We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2302.14615 [pdf, other]

Randomized Kaczmarz in Adversarial Distributed Setting

Authors: Longxiu Huang, Xia Li, Deanna Needell

Abstract: Develo** large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. In this paper, we propose an iterative approach that is adversary-tolerant for convex optimization problems. By leveraging simple statistics, our method ensures convergence and is capable of adapting to adversa… ▽ More Develo** large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. In this paper, we propose an iterative approach that is adversary-tolerant for convex optimization problems. By leveraging simple statistics, our method ensures convergence and is capable of adapting to adversarial distributions. Additionally, the efficiency of the proposed methods for solving convex problems is shown in simulations with the presence of adversaries. Through simulations, we demonstrate the efficiency of our approach in the presence of adversaries and its ability to identify adversarial workers with high accuracy and tolerate varying levels of adversary rates. △ Less

Submitted 13 March, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

MSC Class: 65F20; 65F10; 65K10

arXiv:2302.10755 [pdf, other]

Federated Gradient Matching Pursuit

Authors: Halyun Jeong, Deanna Needell, **g Qin

Abstract: Traditional machine learning techniques require centralizing all training data on one server or data hub. Due to the development of communication technologies and a huge amount of decentralized data on many clients, collaborative machine learning has become the main interest while providing privacy-preserving frameworks. In particular, federated learning (FL) provides such a solution to learn a sh… ▽ More Traditional machine learning techniques require centralizing all training data on one server or data hub. Due to the development of communication technologies and a huge amount of decentralized data on many clients, collaborative machine learning has become the main interest while providing privacy-preserving frameworks. In particular, federated learning (FL) provides such a solution to learn a shared model while kee** training data at local clients. On the other hand, in a wide range of machine learning and signal processing applications, the desired solution naturally has a certain structure that can be framed as sparsity with respect to a certain dictionary. This problem can be formulated as an optimization problem with sparsity constraints and solving it efficiently has been one of the primary research topics in the traditional centralized setting. In this paper, we propose a novel algorithmic framework, federated gradient matching pursuit (FedGradMP), to solve the sparsity constrained minimization problem in the FL setting. We also generalize our algorithms to accommodate various practical FL scenarios when only a subset of clients participate per round, when the local model estimation at clients could be inexact, or when the model parameters are sparse with respect to general dictionaries. Our theoretical analysis shows the linear convergence of the proposed algorithms. A variety of numerical experiments are conducted to demonstrate the great potential of the proposed framework -- fast convergence both in communication rounds and computation time for many important scenarios without sophisticated parameter tuning. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: Submitted to a journal

MSC Class: 65-xx; 68W15; 68W20

arXiv:2301.03467 [pdf, ps, other]

ORKA: Accelerated Kaczmarz Algorithms for Signal Recovery from One-Bit Samples

Authors: Arian Eamaz, Farhang Yeganegi, Deanna Needell, Mojtaba Soltanalian

Abstract: One-bit quantization with time-varying sampling thresholds has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional m… ▽ More One-bit quantization with time-varying sampling thresholds has recently found significant utilization potential in statistical signal processing applications due to its relatively low power consumption and low implementation cost. In addition to such advantages, an attractive feature of one-bit analog-to-digital converters (ADCs) is their superior sampling rates as compared to their conventional multi-bit counterparts. This characteristic endows one-bit signal processing frameworks with what we refer to as sample abundance. On the other hand, many signal recovery and optimization problems are formulated as (possibly non-convex) quadratic programs with linear feasibility constraints in the one-bit sampling regime. We demonstrate, with a particular focus on the nuclear norm minimization, that the sample abundance paradigm allows for the transformation of such quadratic problems to merely a linear feasibility problem by forming a large-scale overdetermined linear system; thus removing the need for costly optimization constraints and objectives. To make this achievable, we propose enhanced randomized Kaczmarz algorithms to tackle these highly overdetermined feasibility problems. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies. △ Less

Submitted 8 December, 2022; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.08982

arXiv:2212.12606 [pdf, other]

A Convergence Rate for Manifold Neural Networks

Authors: Joyce Chew, Deanna Needell, Michael Perlmutter

Abstract: High-dimensional data arises in numerous applications, and the rapidly develo** field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace… ▽ More High-dimensional data arises in numerous applications, and the rapidly develo** field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace Beltrami operator. Moreover, in this work, the authors provide a numerical scheme for implementing such neural networks when the manifold is unknown and one only has access to finitely many sample points. The authors show that this scheme, which relies upon building a data-driven graph, converges to the continuum limit as the number of sample points tends to infinity. Here, we build upon this result by establishing a rate of convergence that depends on the intrinsic dimension of the manifold but is independent of the ambient dimension. We also discuss how the rate of convergence depends on the depth of the network and the number of filters used in each layer. △ Less

Submitted 20 July, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

arXiv:2212.09858 [pdf, other]

Continuous Semi-Supervised Nonnegative Matrix Factorization

Authors: Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula, Deanna Needell

Abstract: Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In this paper, we show this factorization can be combined with regression on a continuous response variable. In practice, the method performs better than re… ▽ More Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In this paper, we show this factorization can be combined with regression on a continuous response variable. In practice, the method performs better than regression done after topics are identified and retrains interpretability. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.03962 [pdf, other]

Multi-Randomized Kaczmarz for Latent Class Regression

Authors: Erin George, Yotam Yaniv, Deanna Needell

Abstract: Linear regression is effective at identifying interpretable trends in a data set, but averages out potentially different effects on subgroups within data. We propose an iterative algorithm based on the randomized Kaczmarz (RK) method to automatically identify subgroups in data and perform linear regression on these groups simultaneously. We prove almost sure convergence for this method, as well as… ▽ More Linear regression is effective at identifying interpretable trends in a data set, but averages out potentially different effects on subgroups within data. We propose an iterative algorithm based on the randomized Kaczmarz (RK) method to automatically identify subgroups in data and perform linear regression on these groups simultaneously. We prove almost sure convergence for this method, as well as linear convergence in expectation under certain conditions. The result is an interpretable collection of different weight vectors for the regressor variables that capture the different trends within data. Furthermore, we experimentally validate our convergence results by demonstrating the method can successfully identify two trends within simulated data. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2212.00237 [pdf, other]

Inference of Media Bias and Content Quality Using Natural-Language Processing

Authors: Zehan Chao, Denali Molitor, Deanna Needell, Mason A. Porter

Abstract: Media bias can significantly impact the formation and development of opinions and sentiments in a population. It is thus important to study the emergence and development of partisan media and political polarization. However, it is challenging to quantitatively infer the ideological positions of media outlets. In this paper, we present a quantitative framework to infer both political bias and conte… ▽ More Media bias can significantly impact the formation and development of opinions and sentiments in a population. It is thus important to study the emergence and development of partisan media and political polarization. However, it is challenging to quantitatively infer the ideological positions of media outlets. In this paper, we present a quantitative framework to infer both political bias and content quality of media outlets from text, and we illustrate this framework with empirical experiments with real-world data. We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets to generate a two-dimensional ideological-bias and content-quality measurement for each tweet. We then infer a ``media-bias chart'' of (bias, quality) coordinates for the media outlets by integrating the (bias, quality) measurements of the tweets of the media outlets. We also apply a variety of baseline machine-learning methods, such as a naive-Bayes method and a support-vector machine (SVM), to infer the bias and quality values for each tweet. All of these baseline approaches are based on a bag-of-words approach. We find that the LSTM-network approach has the best performance of the examined methods. Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 21 pages, 7 figures, 4 tables

arXiv:2211.13496 [pdf, other]

Multi-scale Hybridized Topic Modeling: A Pipeline for Analyzing Unstructured Text Datasets via Topic Modeling

Authors: Keyi Cheng, Stefan Inzer, Adrian Leung, Xiaoxian Shen, Michael Perlmutter, Michael Lindstrom, Joyce Chew, Todd Presner, Deanna Needell

Abstract: We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and th… ▽ More We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and then a transformer-based method, BERTopic. It harnesses the strengths of both NMF and BERTopic. Our method can help researchers and the public better extract and interpret the interview information. Additionally, it provides insights for new indexing systems based on the topic level. We then deploy our method on real-world interview transcripts and find promising results. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2211.06391 [pdf, other]

Online Signal Recovery via Heavy Ball Kaczmarz

Authors: Benjamin Jarman, Yotam Yaniv, Deanna Needell

Abstract: Recovering a signal $x^\ast \in \mathbb{R}^n$ from a sequence of linear measurements is an important problem in areas such as computerized tomography and compressed sensing. In this work, we consider an online setting in which measurements are sampled one-by-one from some source distribution. We propose solving this problem with a variant of the Kaczmarz method with an additional heavy ball moment… ▽ More Recovering a signal $x^\ast \in \mathbb{R}^n$ from a sequence of linear measurements is an important problem in areas such as computerized tomography and compressed sensing. In this work, we consider an online setting in which measurements are sampled one-by-one from some source distribution. We propose solving this problem with a variant of the Kaczmarz method with an additional heavy ball momentum term. A popular technique for solving systems of linear equations, recent work has shown that the Kaczmarz method also enjoys linear convergence when applied to random measurement models, however convergence may be slowed when successive measurements are highly coherent. We demonstrate that the addition of heavy ball momentum may accelerate the convergence of the Kaczmarz method when data is coherent, and provide a theoretical analysis of the method culminating in a linear convergence guarantee for a wide class of source distributions. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 6 pages

arXiv:2211.05749 [pdf, other]

Sketched Gaussian Model Linear Discriminant Analysis via the Randomized Kaczmarz Method

Authors: Jocelyn T. Chi, Deanna Needell

Abstract: We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access t… ▽ More We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access to only one row of the training data at a time. We present convergence guarantees for the sketched predictions on new data within a fixed number of iterations. These guarantees account for both the Gaussian modeling assumptions on the data and algorithmic randomness from the sketching procedure. Finally, we demonstrate performance with varying step-sizes and numbers of iterations. Our numerical experiments demonstrate that sketched LDA can offer a very viable alternative to full data LDA when the data may be too large for full data analysis. △ Less

Submitted 10 November, 2022; originally announced November 2022.

arXiv:2209.04968 [pdf, other]

Population-Based Hierarchical Non-negative Matrix Factorization for Survey Data

Authors: Xiaofu Ding, Xinyu Dong, Olivia McGough, Chenxin Shen, Annie Ulichney, Ruiyao Xu, William Swartworth, Jocelyn T. Chi, Deanna Needell

Abstract: Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for… ▽ More Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for identifying and understanding hierarchical structure in a data matrix constructed from a wide range of data types. Our numerical experiments on synthetic and real survey data demonstrate that PHNMF can recover latent hierarchical population structure in complex data with high accuracy. Moreover, the recovered subpopulation structure is meaningful and can be useful for improving downstream inference. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2209.02415 [pdf, other]

Automatic Infectious Disease Classification Analysis with Concept Discovery

Authors: Elena Sizikova, Joshua Vendrow, Xu Cao, Rachel Grotheer, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Thomas Merkh, R. W. M. A. Madushani, Kenny Moise, Annie Ulichney, Huy V. Vo, Chuntian Wang, Megan Coffee, Kathryn Leonard, Deanna Needell

Abstract: Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss… ▽ More Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmission and improve clinical outcomes. In order to understand and trust neural network predictions, analysis of learned representations is necessary. In this work, we argue that automatic discovery of concepts, i.e., human interpretable attributes, allows for a deep understanding of learned information in medical image analysis tasks, generalizing beyond the training labels or protocols. We provide an overview of existing concept discovery approaches in medical image and computer vision communities, and evaluate representative methods on tuberculosis (TB) prediction and monkeypox prediction tasks. Finally, we propose NMFx, a general NMF formulation of interpretability by concept discovery that works in a unified way in unsupervised, weakly supervised, and supervised scenarios. △ Less

Submitted 14 November, 2022; v1 submitted 28 August, 2022; originally announced September 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 13 pages

arXiv:2208.09723 [pdf, other]

doi 10.1109/TPAMI.2023.3261185

Matrix Completion with Cross-Concentrated Sampling: Bridging Uniform Sampling and CUR Sampling

Authors: HanQin Cai, Longxiu Huang, Pengyu Li, Deanna Needell

Abstract: While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform s… ▽ More While uniform sampling has been widely studied in the matrix completion literature, CUR sampling approximates a low-rank matrix via row and column samples. Unfortunately, both sampling models lack flexibility for various circumstances in real-world applications. In this work, we propose a novel and easy-to-implement sampling strategy, coined Cross-Concentrated Sampling (CCS). By bridging uniform sampling and CUR sampling, CCS provides extra flexibility that can potentially save sampling costs in applications. In addition, we also provide a sufficient condition for CCS-based matrix completion. Moreover, we propose a highly efficient non-convex algorithm, termed Iterative CUR Completion (ICURC), for the proposed CCS model. Numerical experiments verify the empirical advantages of CCS and ICURC against uniform sampling and its baseline algorithms, on both synthetic and real-world datasets. △ Less

Submitted 21 March, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

arXiv:2208.08561 [pdf, other]

Geometric Scattering on Measure Spaces

Authors: Joyce Chew, Matthew Hirn, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter, Holly Steach, Siddharth Viswanath, Hau-Tieng Wu

Abstract: The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and man… ▽ More The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data. △ Less

Submitted 13 October, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

MSC Class: 68T07

arXiv:2207.08171 [pdf, other]

SP2: A Second Order Stochastic Polyak Method

Authors: Shuang Li, William J. Swartworth, Martin Takáč, Deanna Needell, Robert M. Gower

Abstract: Recently the "SP" (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equatio… ▽ More Recently the "SP" (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equations that uses the local second-order approximation of the model. Our resulting method SP2 uses Hessian-vector products to speed-up the convergence of SP. Furthermore, and rather uniquely among second-order methods, the design of SP2 in no way relies on positive definite Hessian matrices or convexity of the objective function. We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2206.12554 [pdf, other]

doi 10.1088/1361-6420/aca78a

On Block Accelerations of Quantile Randomized Kaczmarz for Corrupted Systems of Linear Equations

Authors: Lu Cheng, Benjamin Jarman, Deanna Needell, Elizaveta Rebrova

Abstract: With the growth of large data as well as large-scale learning tasks, the need for efficient and robust linear system solvers is greater than ever. The randomized Kaczmarz method (RK) and similar stochastic iterative methods have received considerable recent attention due to their efficient implementation and memory footprint. These methods can tolerate streaming data, accessing only part of the da… ▽ More With the growth of large data as well as large-scale learning tasks, the need for efficient and robust linear system solvers is greater than ever. The randomized Kaczmarz method (RK) and similar stochastic iterative methods have received considerable recent attention due to their efficient implementation and memory footprint. These methods can tolerate streaming data, accessing only part of the data at a time, and can also approximate the least squares solution even if the system is affected by noise. However, when data is instead affected by large (possibly adversarial) corruptions, these methods fail to converge, as corrupted data points draw iterates far from the true solution. A recently proposed solution to this is the QuantileRK method, which avoids harmful corrupted data by exploring the space carefully as the method iterates. The exploration component requires the computation of quantiles of large samples from the system and is computationally much heavier than the subsequent iteration update. In this paper, we propose an approach that better uses the information obtained during exploration by incorporating an averaged version of the block Kaczmarz method. This significantly speeds up convergence, while still allowing for a constant fraction of the equations to be arbitrarily corrupted. We provide theoretical convergence guarantees as well as experimental supporting evidence. We also demonstrate that the classical projection-based block Kaczmarz method cannot be robust to sparse adversarial corruptions, but rather the blocking has to be carried out by averaging one-dimensional projections. △ Less

Submitted 21 December, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

arXiv:2206.10078 [pdf, other]

The Manifold Scattering Transform for High-Dimensional Point Cloud Data

Authors: Joyce Chew, Holly R. Steach, Siddharth Viswanath, Hau-Tieng Wu, Matthew Hirn, Deanna Needell, Smita Krishnaswamy, Michael Perlmutter

Abstract: The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case… ▽ More The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case of two-dimensional surfaces with predefined meshes. In this work, we present practical schemes, based on the theory of diffusion maps, for implementing the manifold scattering transform to datasets arising in naturalistic systems, such as single cell genetics, where the data is a high-dimensional point cloud modeled as lying on a low-dimensional manifold. We show that our methods are effective for signal classification and manifold classification tasks. △ Less

Submitted 21 January, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Accepted for publication in the TAG in DS Workshop at ICML. For subsequent theoretical guarantees, please see Section 6 of arXiv:2208.08561

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2204.03782 [pdf, ps, other]

Testing Positive Semidefiniteness Using Linear Measurements

Authors: Deanna Needell, William Swartworth, David P. Woodruff

Abstract: We study the problem of testing whether a symmetric $d \times d$ input matrix $A$ is symmetric positive semidefinite (PSD), or is $ε$-far from the PSD cone, meaning that $λ_{\min}(A) \leq - ε\|A\|_p$, where $\|A\|_p$ is the Schatten-$p$ norm of $A$. In applications one often needs to quickly tell if an input matrix is PSD, and a small distance from the PSD cone may be tolerable. We consider two we… ▽ More We study the problem of testing whether a symmetric $d \times d$ input matrix $A$ is symmetric positive semidefinite (PSD), or is $ε$-far from the PSD cone, meaning that $λ_{\min}(A) \leq - ε\|A\|_p$, where $\|A\|_p$ is the Schatten-$p$ norm of $A$. In applications one often needs to quickly tell if an input matrix is PSD, and a small distance from the PSD cone may be tolerable. We consider two well-studied query models for measuring efficiency, namely, the matrix-vector and vector-matrix-vector query models. We first consider one-sided testers, which are testers that correctly classify any PSD input, but may fail on a non-PSD input with a tiny failure probability. Up to logarithmic factors, in the matrix-vector query model we show a tight $\widetildeΘ(1/ε^{p/(2p+1)})$ bound, while in the vector-matrix-vector query model we show a tight $\widetildeΘ(d^{1-1/p}/ε)$ bound, for every $p \geq 1$. We also show a strong separation between one-sided and two-sided testers in the vector-matrix-vector model, where a two-sided tester can fail on both PSD and non-PSD inputs with a tiny failure probability. In particular, for the important case of the Frobenius norm, we show that any one-sided tester requires $\widetildeΩ(\sqrt{d}/ε)$ queries. However we introduce a bilinear sketch for two-sided testing from which we construct a Frobenius norm tester achieving the optimal $\widetilde{O}(1/ε^2)$ queries. We also give a number of additional separations between adaptive and non-adaptive testers. Our techniques have implications beyond testing, providing new methods to approximate the spectrum of a matrix with Frobenius norm error using dimensionality reduction in a way that preserves the signs of eigenvalues. △ Less

Submitted 25 October, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

ACM Class: F.2.1

arXiv:2203.03551 [pdf, other]

Semi-supervised Nonnegative Matrix Factorization for Document Classification

Authors: Jamie Haddock, Lara Kassab, Sixian Li, Alona Kryshchenko, Rachel Grotheer, Elena Sizikova, Chuntian Wang, Thomas Merkh, RWMA Madushani, Miju Ahn, Deanna Needell, Kathryn Leonard

Abstract: We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates f… ▽ More We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters). △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2010.07956

arXiv:2203.00095 [pdf, ps, other]

Distributed randomized Kaczmarz for the adversarial workers

Authors: Xia Li, Longxiu Huang, Deanna Needell

Abstract: Develo** large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. Here, we propose an iterative approach that is adversary-tolerant for least-squares problems. The algorithm utilizes simple statistics to guarantee convergence and is capable of learning the adversarial distrib… ▽ More Develo** large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. Here, we propose an iterative approach that is adversary-tolerant for least-squares problems. The algorithm utilizes simple statistics to guarantee convergence and is capable of learning the adversarial distributions. Additionally, the efficiency of the proposed method is shown in simulations in the presence of adversaries. The results demonstrate the great capability of such methods to tolerate different levels of adversary rates and to identify the erroneous workers with high accuracy. △ Less

Submitted 28 February, 2022; originally announced March 2022.

arXiv:2201.13324 [pdf, other]

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Authors: Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, Deanna Needell

Abstract: Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we… ▽ More Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF). △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 14 pages, 4 figures

arXiv:2110.04703 [pdf, other]

Selectable Set Randomized Kaczmarz

Authors: Yotam Yaniv, Jacob D. Moorman, William Swartworth, Thomas Tu, Daji Landis, Deanna Needell

Abstract: The Randomized Kaczmarz method (RK) is a stochastic iterative method for solving linear systems that has recently grown in popularity due to its speed and low memory requirement. Selectable Set Randomized Kaczmarz (SSRK) is an variant of RK that leverages existing information about the Kaczmarz iterate to identify an adaptive "selectable set" and thus yields an improved convergence guarantee. In t… ▽ More The Randomized Kaczmarz method (RK) is a stochastic iterative method for solving linear systems that has recently grown in popularity due to its speed and low memory requirement. Selectable Set Randomized Kaczmarz (SSRK) is an variant of RK that leverages existing information about the Kaczmarz iterate to identify an adaptive "selectable set" and thus yields an improved convergence guarantee. In this paper, we propose a general perspective for selectable set approaches and prove a convergence result for that framework. In addition, we define two specific selectable set sampling strategies that have competitive convergence guarantees to those of other variants of RK. One selectable set sampling strategy leverages information about the previous iterate, while the other leverages the orthogonality structure of the problem via the Gramian matrix. We complement our theoretical results with numerical experiments that compare our proposed rules with those existing in the literature. △ Less

Submitted 2 February, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

arXiv:2110.03114 [pdf, other]

On audio enhancement via online non-negative matrix factorization

Authors: Andrew Sack, Wenzhao Jiang, Michael Perlmutter, Palina Salanevich, Deanna Needell

Abstract: We propose a method for noise reduction, the task of producing a clean audio signal from a recording corrupted by additive noise. Many common approaches to this problem are based upon applying non-negative matrix factorization to spectrogram measurements. These methods use a noiseless recording, which is believed to be similar in structure to the signal of interest, and a pure-noise recording to l… ▽ More We propose a method for noise reduction, the task of producing a clean audio signal from a recording corrupted by additive noise. Many common approaches to this problem are based upon applying non-negative matrix factorization to spectrogram measurements. These methods use a noiseless recording, which is believed to be similar in structure to the signal of interest, and a pure-noise recording to learn dictionaries for the true signal and the noise. One may then construct an approximation of the true signal by projecting the corrupted recording on to the clean dictionary. In this work, we build upon these methods by proposing the use of \emph{online} non-negative matrix factorization for this problem. This method is more memory efficient than traditional non-negative matrix factorization and also has potential applications to real-time denoising. △ Less

Submitted 6 October, 2021; originally announced October 2021.

MSC Class: 94A12

arXiv:2109.14820 [pdf, other]

A Generalized Hierarchical Nonnegative Tensor Decomposition

Authors: Joshua Vendrow, Jamie Haddock, Deanna Needell

Abstract: Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-m… ▽ More Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-modal structure. Hierarchical NTF (HNTF) methods have been proposed, however these methods do not naturally generalize their matrix-based counterparts. Here, we propose a new HNTF model which directly generalizes a HNMF model special case, and provide a supervised extension. We also provide a multiplicative updates training method for this model. Our experimental results show that this model more naturally illuminates the topic hierarchy than previous HNMF and HNTF methods. △ Less

Submitted 15 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 6 pages, 2 figues, 3 tables

arXiv:2109.14079 [pdf, other]

Robust recovery of bandlimited graph signals via randomized dynamical sampling

Authors: Longxiu Huang, Deanna Needell, Sui Tang

Abstract: Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time… ▽ More Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time nodes at random according to some probability distribution. We show that the number of space-time samples required to ensure stable recovery for each regime depends on a parameter called the spectral graph weighted coherence, that depends on the interplay between the dynamics over the graphs and sampling probability distributions. In optimal scenarios, no more than $\mathcal{O}(k \log(k))$ space-time samples are sufficient to ensure accurate and stable recovery of all $k$-bandlimited signals. In any case, dynamical sampling typically requires much fewer spatial samples than the static case by leveraging the temporal information. Then, we propose a computationally efficient method to reconstruct $k$-bandlimited signals from their space-time samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we test dynamical sampling techniques on a wide variety of graphs. The numerical results support our theoretical findings and demonstrate the efficiency. △ Less

Submitted 3 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: corrected mistakes in plotting. arXiv admin note: text overlap with arXiv:1511.05118 by other authors

MSC Class: 94A20; 94A12

arXiv:2109.10454 [pdf, other]

Modewise Operators, the Tensor Restricted Isometry Property, and Low-Rank Tensor Recovery

Authors: Mark A. Iwen, Deanna Needell, Michael Perlmutter, Elizaveta Rebrova

Abstract: Recovery of sparse vectors and low-rank matrices from a small number of linear measurements is well-known to be possible under various model assumptions on the measurements. The key requirement on the measurement matrices is typically the restricted isometry property, that is, approximate orthonormality when acting on the subspace to be recovered. Among the most widely used random matrix measureme… ▽ More Recovery of sparse vectors and low-rank matrices from a small number of linear measurements is well-known to be possible under various model assumptions on the measurements. The key requirement on the measurement matrices is typically the restricted isometry property, that is, approximate orthonormality when acting on the subspace to be recovered. Among the most widely used random matrix measurement models are (a) independent sub-gaussian models and (b) randomized Fourier-based models, allowing for the efficient computation of the measurements. For the now ubiquitous tensor data, direct application of the known recovery algorithms to the vectorized or matricized tensor is awkward and memory-heavy because of the huge measurement matrices to be constructed and stored. In this paper, we propose modewise measurement schemes based on sub-gaussian and randomized Fourier measurements. These modewise operators act on the pairs or other small subsets of the tensor modes separately. They require significantly less memory than the measurements working on the vectorized tensor, provably satisfy the tensor restricted isometry property and experimentally can recover the tensor data from fewer measurements and do not require impractical storage. △ Less

Submitted 21 September, 2021; originally announced September 2021.

MSC Class: 15B52; 15A69; 15A83; 97N40

arXiv:2108.10448 [pdf, other]

Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition

Authors: HanQin Cai, Zehan Chao, Longxiu Huang, Deanna Needell

Abstract: We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tens… ▽ More We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: Accepted to Workshop on Robust Subspace Learning and Applications in Computer Vision, International Conference on Computer Vision (ICCV) 2021

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 189-197, 2021

arXiv:2108.02304 [pdf, other]

QuantileRK: Solving Large-Scale Linear Systems with Corrupted, Noisy Data

Authors: Benjamin Jarman, Deanna Needell

Abstract: Measurement data in linear systems arising from real-world applications often suffers from both large, sparse corruptions, and widespread small-scale noise. This can render many popular solvers ineffective, as the least squares solution is far from the desired solution, and the underlying consistent system becomes harder to identify and solve. QuantileRK is a member of the Kaczmarz family of itera… ▽ More Measurement data in linear systems arising from real-world applications often suffers from both large, sparse corruptions, and widespread small-scale noise. This can render many popular solvers ineffective, as the least squares solution is far from the desired solution, and the underlying consistent system becomes harder to identify and solve. QuantileRK is a member of the Kaczmarz family of iterative projective methods that has been shown to converge exponentially for systems with arbitrarily large sparse corruptions. In this paper, we extend the analysis to the case where there are not only corruptions present, but also noise that may affect every data point, and prove that QuantileRK converges with the same rate up to an error threshold. We give both theoretical and experimental results demonstrating QuantileRK's strength. △ Less

Submitted 4 August, 2021; originally announced August 2021.

MSC Class: 65F10; 68W20

Showing 1–50 of 156 results for author: Needell, D