-
A Robust Two-Level Schwarz Preconditioner For Sparse Matrices
Authors:
Hussam Al Daas,
Pierre Jolivet,
Frédéric Nataf,
Pierre-Henri Tournier
Abstract:
This paper introduces a fully algebraic two-level additive Schwarz preconditioner for general sparse large-scale matrices. The preconditioner is analyzed for symmetric positive definite (SPD) matrices. For those matrices, the coarse space is constructed based on approximating two local subspaces in each subdomain. These subspaces are obtained by approximating a number of eigenvectors corresponding…
▽ More
This paper introduces a fully algebraic two-level additive Schwarz preconditioner for general sparse large-scale matrices. The preconditioner is analyzed for symmetric positive definite (SPD) matrices. For those matrices, the coarse space is constructed based on approximating two local subspaces in each subdomain. These subspaces are obtained by approximating a number of eigenvectors corresponding to dominant eigenvalues of two judiciously posed generalized eigenvalue problems. The number of eigenvectors can be chosen to control the condition number. For general sparse matrices, the coarse space is constructed by approximating the image of a local operator that can be defined from information in the coefficient matrix. The connection between the coarse spaces for SPD and general matrices is also discussed. Numerical experiments show the great effectiveness of the proposed preconditioners on matrices arising from a wide range of applications. The set of matrices includes SPD, symmetric indefinite, nonsymmetric, and saddle-point matrices. In addition, we compare the proposed preconditioners to the state-of-the-art domain decomposition preconditioners.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation
Authors:
Hussam Al Daas,
Grey Ballard,
Laura Grigori,
Suraj Kumar,
Kathryn Rouse
Abstract:
Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a con…
▽ More
Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.
△ Less
Submitted 2 February, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
Authors:
Hussam Al Daas,
Grey Ballard,
Laura Grigori,
Suraj Kumar,
Kathryn Rouse
Abstract:
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more effic…
▽ More
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more efficient algorithms that match the leading-order terms in the lower bounds exactly and improve practical performance. The main result of this work is the establishment of memory-independent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the aspect ratios of the matrices.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Efficient Algebraic Two-Level Schwarz Preconditioner For Sparse Matrices
Authors:
Hussam Al Daas,
Pierre Jolivet,
Tyrone Rees
Abstract:
Domain decomposition methods are among the most efficient for solving sparse linear systems of equations. Their effectiveness relies on a judiciously chosen coarse space. Originally introduced and theoretically proved to be efficient for self-adjoint operators, spectral coarse spaces have been proposed in the past few years for indefinite and non-self-adjoint operators. This paper presents a new s…
▽ More
Domain decomposition methods are among the most efficient for solving sparse linear systems of equations. Their effectiveness relies on a judiciously chosen coarse space. Originally introduced and theoretically proved to be efficient for self-adjoint operators, spectral coarse spaces have been proposed in the past few years for indefinite and non-self-adjoint operators. This paper presents a new spectral coarse space that can be constructed in a fully-algebraic way unlike most existing spectral coarse spaces. We present theoretical convergence result for Hermitian positive definite diagonally dominant matrices. Numerical experiments and comparisons against state-of-the-art preconditioners in the multigrid community show that the resulting two-level Schwarz preconditioner is efficient especially for non-self-adjoint operators. Furthermore, in this case, our proposed preconditioner outperforms state-of-the-art preconditioners.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
Randomized algorithms for rounding in the Tensor-Train format
Authors:
Hussam Al Daas,
Grey Ballard,
Paul Cazeaux,
Eric Hallman,
Agnieszka Miedlar,
Mirjeta Pasha,
Tim W. Reid,
Arvind K. Saibaba
Abstract:
The Tensor-Train (TT) format is a highly compact low-rank representation for high-dimensional tensors. TT is particularly useful when representing approximations to the solutions of certain types of parametrized partial differential equations. For many of these problems, computing the solution explicitly would require an infeasible amount of memory and computational time. While the TT format makes…
▽ More
The Tensor-Train (TT) format is a highly compact low-rank representation for high-dimensional tensors. TT is particularly useful when representing approximations to the solutions of certain types of parametrized partial differential equations. For many of these problems, computing the solution explicitly would require an infeasible amount of memory and computational time. While the TT format makes these problems tractable, iterative techniques for solving the PDEs must be adapted to perform arithmetic while maintaining the implicit structure. The fundamental operation used to maintain feasible memory and computational time is called rounding, which truncates the internal ranks of a tensor already in TT format. We propose several randomized algorithms for this task that are generalizations of randomized low-rank matrix approximation algorithms and provide significant reduction in computation compared to deterministic TT-rounding algorithms. Randomization is particularly effective in the case of rounding a sum of TT-tensors (where we observe 20x speedup), which is the bottleneck computation in the adaptation of GMRES to vectors in TT format. We present the randomized algorithms and compare their empirical accuracy and computational time with deterministic alternatives.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
A Robust Algebraic Multilevel Domain Decomposition Preconditioner For Sparse Symmetric Positive Definite Matrices
Authors:
Hussam Al Daas,
Pierre Jolivet
Abstract:
Domain decomposition (DD) methods are widely used as preconditioner techniques. Their effectiveness relies on the choice of a locally constructed coarse space. Thus far, this construction was mostly achieved using non-assembled matrices from discretized partial differential equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming from PDEs. In this paper, we pre…
▽ More
Domain decomposition (DD) methods are widely used as preconditioner techniques. Their effectiveness relies on the choice of a locally constructed coarse space. Thus far, this construction was mostly achieved using non-assembled matrices from discretized partial differential equations (PDEs). Therefore, DD methods were mainly successful when solving systems stemming from PDEs. In this paper, we present a fully algebraic multilevel DD method where the coarse space can be constructed locally and efficiently without any information besides the coefficient matrix. The condition number of the preconditioned matrix can be bounded by a user-prescribed number. Numerical experiments illustrate the effectiveness of the preconditioner on a range of problems arising from different applications.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
A Robust Algebraic Domain Decomposition Preconditioner for Sparse Normal Equations
Authors:
Hussam Al Daas,
Pierre Jolivet,
Jennifer Scott
Abstract:
Solving the normal equations corresponding to large sparse linear least-squares problems is an important and challenging problem. For very large problems, an iterative solver is needed and, in general, a preconditioner is required to achieve good convergence. In recent years, a number of preconditioners have been proposed. These are largely serial and reported results demonstrate that none of the…
▽ More
Solving the normal equations corresponding to large sparse linear least-squares problems is an important and challenging problem. For very large problems, an iterative solver is needed and, in general, a preconditioner is required to achieve good convergence. In recent years, a number of preconditioners have been proposed. These are largely serial and reported results demonstrate that none of the commonly used preconditioners for the normal equations matrix is capable of solving all sparse least-squares problems. Our interest is thus in designing new preconditioners for the normal equations that are efficient, robust, and can be implemented in parallel. Our proposed preconditioners can be constructed efficiently and algebraically without any knowledge of the problem and without any assumption on the least-squares matrix except that it is sparse. We exploit the structure of the symmetric positive definite normal equations matrix and use the concept of algebraic local symmetric positive semi-definite splittings to introduce two-level Schwarz preconditioners for least-squares problems. The condition number of the preconditioned normal equations is shown to be theoretically bounded independently of the number of subdomains in the splitting. This upper bound can be adjusted using a single parameter $τ$ that the user can specify. We discuss how the new preconditioners can be implemented on top of the PETSc library using only 150 lines of Fortran, C, or Python code. Problems arising from practical applications are used to compare the performance of the proposed new preconditioner with that of other preconditioners.
△ Less
Submitted 2 January, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Two-level Nyström--Schur preconditioner for sparse symmetric positive definite matrices
Authors:
Hussam Al Daas,
Tyrone Rees,
Jennifer Scott
Abstract:
Randomized methods are becoming increasingly popular in numerical linear algebra. However, few attempts have been made to use them in develo** preconditioners. Our interest lies in solving large-scale sparse symmetric positive definite linear systems of equations where the system matrix is preordered to doubly bordered block diagonal form (for example, using a nested dissection ordering). We inv…
▽ More
Randomized methods are becoming increasingly popular in numerical linear algebra. However, few attempts have been made to use them in develo** preconditioners. Our interest lies in solving large-scale sparse symmetric positive definite linear systems of equations where the system matrix is preordered to doubly bordered block diagonal form (for example, using a nested dissection ordering). We investigate the use of randomized methods to construct high quality preconditioners. In particular, we propose a new and efficient approach that employs Nyström's method for computing low rank approximations to develop robust algebraic two-level preconditioners. Construction of the new preconditioners involves iteratively solving a smaller but denser symmetric positive definite Schur complement system with multiple right-hand sides. Numerical experiments on problems coming from a range of application areas demonstrate that this inner system can be solved cheaply using block conjugate gradients and that using a large convergence tolerance to limit the cost does not adversely affect the quality of the resulting Nyström--Schur two-level preconditioner.
△ Less
Submitted 27 July, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Parallel Algorithms for Tensor Train Arithmetic
Authors:
Hussam Al Daas,
Grey Ballard,
Peter Benner
Abstract:
We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that ex…
▽ More
We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a $34\times$ speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.
△ Less
Submitted 7 September, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Low-Rank and Total Variation Regularization and Its Application to Image Recovery
Authors:
Pawan Goyal,
Hussam Al Daas,
Peter Benner
Abstract:
In this paper, we study the problem of image recovery from given partial (corrupted) observations. Recovering an image using a low-rank model has been an active research area in data analysis and machine learning. But often, images are not only of low-rank but they also exhibit sparsity in a transformed space. In this work, we propose a new problem formulation in such a way that we seek to recover…
▽ More
In this paper, we study the problem of image recovery from given partial (corrupted) observations. Recovering an image using a low-rank model has been an active research area in data analysis and machine learning. But often, images are not only of low-rank but they also exhibit sparsity in a transformed space. In this work, we propose a new problem formulation in such a way that we seek to recover an image that is of low-rank and has sparsity in a transformed domain. We further discuss various non-convex non-smooth surrogates of the rank function, leading to a relaxed problem. Then, we present an efficient iterative scheme to solve the relaxed problem that essentially employs the (weighted) singular value thresholding at each iteration. Furthermore, we discuss the convergence properties of the proposed iterative method. We perform extensive experiments, showing that the proposed algorithm outperforms state-of-the-art methodologies in recovering images.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.