-
Deriving Algorithms for Triangular Tridiagonalization a (Skew-)Symmetric Matrix
Authors:
Robert van de Geijn,
Maggie Myers,
RuQing G. Xu,
Devin Matthews
Abstract:
We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high perf…
▽ More
We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high performance. The insights can be easily extended to yield algorithms for computing the $ L T L^T $ decomposition of a symmetric matrix.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
GEMMFIP: Unifying GEMM in BLIS
Authors:
RuQing G. Xu,
Field G. Van Zee,
Robert A. van de Geijn
Abstract:
Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse p…
▽ More
Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse packing -- an operation that copies data to a contiguous layout in memory and which is critical for large matrix performance -- with the first computational "pass" over that data. This boosts performance across the problem size spectrum. As a result, tuning general-purpose libraries becomes simpler since it obviates the need to carefully express and parameterize logic that chooses between a "small matrix" strategy and a "large matrix" strategy. A prototype implementation of the technique built with the BLAS-like Library Instantiation Software (BLIS) framework is described and performance on a range of architectures is reported.
△ Less
Submitted 16 February, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Optimized Implementation for Calculation and Fast-Update of Pfaffians Installed to the Open-Source Fermionic Variational Solver mVMC
Authors:
RuQing G. Xu,
Tsuyoshi Okubo,
Synge Todo,
Masatoshi Imada
Abstract:
In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-…
▽ More
In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-wavefunction-based many-variable Variational Monte Carlo (mVMC) code boosts sampling performance to up to more than $6$ times without changing Markov chain's behavior. The implementation is based on an extension of the BLAS-like instantiation software (BLIS) framework which has optimized kernel for many state-of-the-art processors including Intel Skylake-X, AMD EPYC Rome and Fujitsu A64FX.
△ Less
Submitted 8 April, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Scaling dimensions from linearized tensor renormalization group transformations
Authors:
Xinliang Lyu,
RuQing G. Xu,
Naoki Kawashima
Abstract:
We show a way to perform the canonical renormalization group (RG) prescription in tensor space: write down the tensor RG equation, linearize it around a fixed-point tensor, and diagonalize the resulting linearized RG equation to obtain scaling dimensions. The tensor RG methods have had a great success in producing accurate free energy compared with the conventional real-space RG schemes. However,…
▽ More
We show a way to perform the canonical renormalization group (RG) prescription in tensor space: write down the tensor RG equation, linearize it around a fixed-point tensor, and diagonalize the resulting linearized RG equation to obtain scaling dimensions. The tensor RG methods have had a great success in producing accurate free energy compared with the conventional real-space RG schemes. However, the above-mentioned canonical procedure has not been implemented for general tensor-network-based RG schemes. We extend the success of the tensor methods further to extraction of scaling dimensions through the canonical RG prescription, without explicitly using the conformal field theory. This approach is benchmarked in the context of the Ising models in 1D and 2D. Based on a pure RG argument, the proposed method has potential applications to 3D systems, where the existing bread-and-butter method is inapplicable.
△ Less
Submitted 21 March, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.