Search | arXiv e-print repository

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Authors: Shifan Zhao, Jiaying Lu, Ji Yang, Edmond Chow, Yuanzhe Xi

Abstract: Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical application… ▽ More Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical applications. However, a systematic approach to handle these misspecifications is lacking in the literature. In this work, we propose a general framework to address these issues. Firstly, we introduce a flexible two-stage GPR framework that separates mean prediction and uncertainty quantification (UQ) to prevent mean misspecification, which can introduce bias into the model. Secondly, kernel function misspecification is addressed through a novel automatic kernel search algorithm, supported by theoretical analysis, that selects the optimal kernel from a candidate set. Additionally, we propose a subsampling-based warm-start strategy for hyperparameter initialization to improve efficiency and avoid hyperparameter misspecification. With much lower computational cost, our subsampling-based strategy can yield competitive or better performance than training exclusively on the full dataset. Combining all these components, we recommend two GPR methods-exact and scalable-designed to match available computational resources and specific UQ requirements. Extensive evaluation on real-world datasets, including UCI benchmarks and a safety-critical medical case study, demonstrates the robustness and precision of our methods. △ Less

Submitted 22 May, 2024; originally announced May 2024.

ACM Class: G.3; J.3

arXiv:2310.02246 [pdf, other]

Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances

Authors: Mikhail Khodak, Edmond Chow, Maria-Florina Balcan, Ameet Talwalkar

Abstract: Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear system… ▽ More Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter $ω$ has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm--using only the number of iterations as feedback--can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $ω$ as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best $ω$ for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms. △ Less

Submitted 2 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: ICLR 2024 Spotlight

arXiv:2304.05460 [pdf, other]

An Adaptive Factorized Nyström Preconditioner for Regularized Kernel Matrices

Authors: Shifan Zhao, Tianshi Xu, Hua Huang, Edmond Chow, Yuanzhe Xi

Abstract: The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nyström (AFN) preconditioner. The preconditioner is designed for the case where the ra… ▽ More The spectrum of a kernel matrix significantly depends on the parameter values of the kernel function used to define the kernel matrix. This makes it challenging to design a preconditioner for a regularized kernel matrix that is robust across different parameter values. This paper proposes the Adaptive Factorized Nyström (AFN) preconditioner. The preconditioner is designed for the case where the rank k of the Nyström approximation is large, i.e., for kernel function parameters that lead to kernel matrices with eigenvalues that decay slowly. AFN deliberately chooses a well-conditioned submatrix to solve with and corrects a Nyström approximation with a factorized sparse approximate matrix inverse. This makes AFN efficient for kernel matrices with large numerical ranks. AFN also adaptively chooses the size of this submatrix to balance accuracy and cost. △ Less

Submitted 9 April, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2212.12674 [pdf, other]

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach

Authors: Difeng Cai, Edmond Chow, Yuanzhe Xi

Abstract: A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = κ(x_i,y_j)$ where $κ(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identica… ▽ More A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = κ(x_i,y_j)$ where $κ(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed. △ Less

Submitted 28 June, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2206.01885 [pdf, other]

Data-driven Construction of Hierarchical Matrices with Nested Bases

Authors: Difeng Cai, Hua Huang, Edmond Chow, Yuanzhe Xi

Abstract: Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with $O(n)$ complexity for the memory-ef… ▽ More Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with $O(n)$ complexity for the memory-efficient construction of hierarchical matrices with nested bases where $n$ is the number of data points. HiDR aims to reduce the given data in a hierarchical way so as to obtain $O(1)$ representations for all nearfield and farfield interactions. Based on HiDR, a linear complexity $\mathcal{H}^2$ matrix construction algorithm is proposed. The use of data-driven methods enables {better efficiency than other general-purpose methods} and flexible computation without accessing the kernel function. Experiments demonstrate significantly improved memory efficiency of the proposed data-driven method compared to interpolation-based methods over a wide range of kernels. Though the method is not optimized for any special kernel, benchmark experiments for the Coulomb kernel show that the proposed general-purpose algorithm offers competitive performance for hierarchical matrix construction compared to several state-of-the-art algorithms for the Coulomb kernel. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 26 pages, 20 figures

MSC Class: 15A23 (Primary); 68W25; 65D40 (Secondary)

arXiv:2011.07632 [pdf, ps, other]

Efficient construction of an HSS preconditioner for symmetric positive definite $\mathcal{H}^2$ matrices

Authors: Xin Xing, Hua Huang, Edmond Chow

Abstract: In an iterative approach for solving linear systems with ill-conditioned, symmetric positive definite (SPD) kernel matrices, both fast matrix-vector products and fast preconditioning operations are required. Fast (linear-scaling) matrix-vector products are available by expressing the kernel matrix in an $\mathcal{H}^2$ representation or an equivalent fast multipole method representation. Precondit… ▽ More In an iterative approach for solving linear systems with ill-conditioned, symmetric positive definite (SPD) kernel matrices, both fast matrix-vector products and fast preconditioning operations are required. Fast (linear-scaling) matrix-vector products are available by expressing the kernel matrix in an $\mathcal{H}^2$ representation or an equivalent fast multipole method representation. Preconditioning such matrices, however, requires a structured matrix approximation that is more regular than the $\mathcal{H}^2$ representation, such as the hierarchically semiseparable (HSS) matrix representation, which provides fast solve operations. Previously, an algorithm was presented to construct an HSS approximation to an SPD kernel matrix that is guaranteed to be SPD. However, this algorithm has quadratic cost and was only designed for recursive binary partitionings of the points defining the kernel matrix. This paper presents a general algorithm for constructing an SPD HSS approximation. Importantly, the algorithm uses the $\mathcal{H}^2$ representation of the SPD matrix to reduce its computational complexity from quadratic to quasilinear. Numerical experiments illustrate how this SPD HSS approximation performs as a preconditioner for solving linear systems arising from a range of kernel functions. △ Less

Submitted 11 January, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

MSC Class: 15B99; 65F99; 65Z05

arXiv:2010.09099 [pdf, other]

Decentralized and Secure Generation Maintenance with Differential Privacy

Authors: Paritosh Ramanan, Murat Yildirim, Nagi Gebraeel, Edmond Chow

Abstract: Decentralized methods are gaining popularity for data-driven models in power systems as they offer significant computational scalability while guaranteeing full data ownership by utility stakeholders. However, decentralized methods still require sharing information about network flow estimates over public facing communication channels, which raises privacy concerns. In this paper we propose a diff… ▽ More Decentralized methods are gaining popularity for data-driven models in power systems as they offer significant computational scalability while guaranteeing full data ownership by utility stakeholders. However, decentralized methods still require sharing information about network flow estimates over public facing communication channels, which raises privacy concerns. In this paper we propose a differential privacy driven approach geared towards decentralized formulations of mixed integer operations and maintenance optimization problems that protects network flow estimates. We prove strong privacy guarantees by leveraging the linear relationship between the phase angles and the flow. To address the challenges associated with the mixed integer and dynamic nature of the problem, we introduce an exponential moving average based consensus mechanism to enhance convergence, coupled with a control chart based convergence criteria to improve stability. Our experimental results obtained on the IEEE 118 bus case demonstrate that our privacy preserving approach yields solution qualities on par with benchmark methods without differential privacy. To demonstrate the computational robustness of our method, we conduct experiments using a wide range of noise levels and operational scenarios. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2010.09055 [pdf, other]

Large-Scale Maintenance and Unit Commitment: A Decentralized Subgradient Approach

Authors: Paritosh Ramanan, Murat Yildirim, Nagi Gebraeel, Edmond Chow

Abstract: Unit Commitment (UC) is a fundamental problem in power system operations. When coupled with generation maintenance, the joint optimization problem poses significant computational challenges due to coupling constraints linking maintenance and UC decisions. Obviously, these challenges grow with the size of the network. With the introduction of sensors for monitoring generator health and condition-ba… ▽ More Unit Commitment (UC) is a fundamental problem in power system operations. When coupled with generation maintenance, the joint optimization problem poses significant computational challenges due to coupling constraints linking maintenance and UC decisions. Obviously, these challenges grow with the size of the network. With the introduction of sensors for monitoring generator health and condition-based maintenance(CBM), these challenges have been magnified. ADMM-based decentralized methods have shown promise in solving large-scale UC problems, especially in vertically integrated power systems. However, in their current form, these methods fail to deliver similar computational performance and scalability when considering the joint UC and CBM problem. This paper provides a novel decentralized optimization framework for solving large-scale, joint UC and CBM problems. Our approach relies on the novel use of the subgradient method to temporally decouple various subproblems of the ADMM-based formulation of the joint problem along the maintenance horizon. By effectively utilizing multithreading, our decentralized subgradient approach delivers superior computational performance and eliminates the need to move sensor data thereby alleviating privacy and security concerns. Using experiments on large scale test cases, we show that our framework can provide a speedup of upto 50x as compared to various state of the art benchmarks without compromising on solution quality. △ Less

Submitted 7 March, 2022; v1 submitted 18 October, 2020; originally announced October 2020.

arXiv:2009.02015 [pdf, other]

Asynchronous Richardson iterations

Authors: Edmond Chow, Andreas Frommer, Daniel B. Szyld

Abstract: We consider asynchronous versions of the first and second order Richardson methods for solving linear systems of equations. These methods depend on parameters whose values are chosen a priori. We explore the parameter values that can be proven to give convergence of the asynchronous methods. This is the first such analysis for asynchronous second order methods. We find that for the first order met… ▽ More We consider asynchronous versions of the first and second order Richardson methods for solving linear systems of equations. These methods depend on parameters whose values are chosen a priori. We explore the parameter values that can be proven to give convergence of the asynchronous methods. This is the first such analysis for asynchronous second order methods. We find that for the first order method, the optimal parameter value for the synchronous case also gives an asynchronously convergent method. For the second order method, the parameter ranges for which we can prove asynchronous convergence do not contain the optimal parameter values for the synchronous iteration. In practice, however, the asynchronous second order iterations may still converge using the optimal parameter values, or parameter values close to the optimal ones, despite this result. We explore this behavior with a multithreaded parallel implementation of the asynchronous methods. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: 15 pages

Report number: Report 20-09-03 Department of Mathematics, Temple University; Preprint imacm_20_32, Institute of Mathematical Modelling, Analysis and Computational Mathematics University of Wuppertal MSC Class: 65F10; 65N22; 15A06 ACM Class: F.2.1; G.1.3

arXiv:1811.04134 [pdf, ps, other]

An efficient method for block low-rank approximations for kernel matrix systems

Authors: Xin Xing, Edmond Chow

Abstract: In the iterative solution of dense linear systems from boundary integral equations or systems involving kernel matrices, the main challenges are the expensive matrix-vector multiplication and the storage cost which are usually tackled by hierarchical matrix techniques such as $\mathcal{H}$ and $\mathcal{H}^2$ matrices. However, hierarchical matrices also have a high construction cost that is domin… ▽ More In the iterative solution of dense linear systems from boundary integral equations or systems involving kernel matrices, the main challenges are the expensive matrix-vector multiplication and the storage cost which are usually tackled by hierarchical matrix techniques such as $\mathcal{H}$ and $\mathcal{H}^2$ matrices. However, hierarchical matrices also have a high construction cost that is dominated by the low-rank approximations of the sub-blocks of the kernel matrix. In this paper, an efficient method is proposed to give a low-rank approximation of the kernel matrix block $K(X_0, Y_0)$ in the form of an interpolative decomposition (ID) for a kernel function $K(x,y)$ and two properly located point sets $X_0, Y_0$. The proposed method combines the ID using strong rank-revealing QR (sRRQR), which is purely algebraic, with analytic kernel information to reduce the construction cost of a rank-$r$ approximation from $O(r|X_0||Y_0|)$, for ID using sRRQR alone, to $O(r|X_0|)$ which is not related to $|Y_0|$. Numerical experiments show that $\mathcal{H}^2$ matrix construction with the proposed algorithm only requires a computational cost linear in the matrix dimension. △ Less

Submitted 9 November, 2018; originally announced November 2018.

arXiv:1811.00131 [pdf, ps, other]

doi 10.1016/j.acha.2019.11.003

Error analysis of an accelerated interpolative decomposition for 3D Laplace problems

Authors: Xin Xing, Edmond Chow

Abstract: In constructing the $\mathcal{H}^2$ representation of dense matrices defined by the Laplace kernel, the interpolative decomposition of certain off-diagonal submatrices that dominates the computation can be dramatically accelerated using the concept of a proxy surface. We refer to the computation of such interpolative decompositions as the proxy surface method. We present an error bound for the pro… ▽ More In constructing the $\mathcal{H}^2$ representation of dense matrices defined by the Laplace kernel, the interpolative decomposition of certain off-diagonal submatrices that dominates the computation can be dramatically accelerated using the concept of a proxy surface. We refer to the computation of such interpolative decompositions as the proxy surface method. We present an error bound for the proxy surface method in the 3D case and thus provide theoretical guidance for the discretization of the proxy surface in the method. △ Less

Submitted 31 October, 2018; originally announced November 2018.

MSC Class: 65G99

Journal ref: Applied and Computational Harmonic Analysis Volume 49, Issue 1, July 2020, Pages 316-327

arXiv:1808.08172 [pdf, other]

Asynchronous One-Level and Two-Level Domain Decomposition Solvers

Authors: Christian Glusa, Paritosh Ramanan, Erik G. Boman, Edmond Chow, Sivasankaran Rajamanickam

Abstract: Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes on more heterogeneous systems make load balancing and network layout very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchr… ▽ More Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes on more heterogeneous systems make load balancing and network layout very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchronous communication based on one-sided MPI primitives in a multitude of domain decomposition solvers. In particular, a scalable asynchronous two-level method is presented. We discuss practical issues encountered in the development of a scalable solver and show experimental results obtained on state-of-the-art supercomputer systems that illustrate the benefits of asynchronous solvers in load balanced as well as load imbalanced scenarios. Using the novel method, we can observe speed-ups of up to 4x over its classical synchronous equivalent. △ Less

Submitted 10 August, 2020; v1 submitted 24 August, 2018; originally announced August 2018.

MSC Class: 68W10; 65Y05; 68W15; 65N55

arXiv:1705.05443 [pdf, other]

SMASH: Structured matrix approximation by separation and hierarchy

Authors: Difeng Cai, Edmond Chow, Yousef Saad, Yuanzhe Xi

Abstract: This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first constructed based on an adaptive partitioning of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes ba… ▽ More This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first constructed based on an adaptive partitioning of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes based on either analytic or purely algebraic approximations, SMASH takes advantage of both approaches and greatly improves the efficiency. The algorithm follows a bottom-up traversal of the tree and is able to perform the operations associated with each node on the same level in parallel. A strong rank-revealing factorization is applied to the initial analytic approximation in the separation regime so that a special structure is incorporated into the final nested bases. As a consequence, the storage is significantly reduced on one hand and a hierarchy of the original grid is constructed on the other hand. Due to this hierarchy, nested bases at upper levels can be computed in a similar way as the leaf level operations but on coarser grids. The main advantages of SMASH include its simplicity of implementation, its flexibility to construct various hierarchical rank structures and a low storage cost. Rigorous error analysis and complexity analysis are conducted to show that this scheme is fast and stable. The efficiency and robustness of SMASH are demonstrated through various test problems arising from integral equations, structured matrices, etc. △ Less

Submitted 15 May, 2017; originally announced May 2017.

Showing 1–13 of 13 results for author: Chow, E