Search | arXiv e-print repository

Deriving Algorithms for Triangular Tridiagonalization a (Skew-)Symmetric Matrix

Authors: Robert van de Geijn, Maggie Myers, RuQing G. Xu, Devin Matthews

Abstract: We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high perf… ▽ More We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high performance. The insights can be easily extended to yield algorithms for computing the $ L T L^T $ decomposition of a symmetric matrix. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 28 pages

arXiv:2302.08417 [pdf, other]

GEMMFIP: Unifying GEMM in BLIS

Authors: RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn

Abstract: Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse p… ▽ More Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse packing -- an operation that copies data to a contiguous layout in memory and which is critical for large matrix performance -- with the first computational "pass" over that data. This boosts performance across the problem size spectrum. As a result, tuning general-purpose libraries becomes simpler since it obviates the need to carefully express and parameterize logic that chooses between a "small matrix" strategy and a "large matrix" strategy. A prototype implementation of the technique built with the BLAS-like Library Instantiation Software (BLIS) framework is described and performance on a range of architectures is reported. △ Less

Submitted 16 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 16 pages, 7 figures, 2 algorithms

ACM Class: G.4

arXiv:2105.13098 [pdf, ps, other]

doi 10.1016/j.cpc.2022.108375

Optimized Implementation for Calculation and Fast-Update of Pfaffians Installed to the Open-Source Fermionic Variational Solver mVMC

Authors: RuQing G. Xu, Tsuyoshi Okubo, Synge Todo, Masatoshi Imada

Abstract: In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-… ▽ More In this article, we present a high performance, portable and well templated implementation for computing and fast-updating Pfaffian and inverse of an even-ranked skew-symmetric (antisymmetric) matrix. It is achieved with a skew-symmetric, blocked variant of the Parlett-Reid algorithm and a blocked update scheme based on the Woodbury matrix identity. Installation of this framework into the geminal-wavefunction-based many-variable Variational Monte Carlo (mVMC) code boosts sampling performance to up to more than $6$ times without changing Markov chain's behavior. The implementation is based on an extension of the BLAS-like instantiation software (BLIS) framework which has optimized kernel for many state-of-the-art processors including Intel Skylake-X, AMD EPYC Rome and Fujitsu A64FX. △ Less

Submitted 8 April, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: 18 pages, 3 figures and 3 tables. Source code presented in this work can be obtained from http://github.com/issp-center-dev/mVMC/tree/develop

arXiv:2102.08136 [pdf, other]

doi 10.1103/PhysRevResearch.3.023048

Scaling dimensions from linearized tensor renormalization group transformations

Authors: Xinliang Lyu, RuQing G. Xu, Naoki Kawashima

Abstract: We show a way to perform the canonical renormalization group (RG) prescription in tensor space: write down the tensor RG equation, linearize it around a fixed-point tensor, and diagonalize the resulting linearized RG equation to obtain scaling dimensions. The tensor RG methods have had a great success in producing accurate free energy compared with the conventional real-space RG schemes. However,… ▽ More We show a way to perform the canonical renormalization group (RG) prescription in tensor space: write down the tensor RG equation, linearize it around a fixed-point tensor, and diagonalize the resulting linearized RG equation to obtain scaling dimensions. The tensor RG methods have had a great success in producing accurate free energy compared with the conventional real-space RG schemes. However, the above-mentioned canonical procedure has not been implemented for general tensor-network-based RG schemes. We extend the success of the tensor methods further to extraction of scaling dimensions through the canonical RG prescription, without explicitly using the conformal field theory. This approach is benchmarked in the context of the Ising models in 1D and 2D. Based on a pure RG argument, the proposed method has potential applications to 3D systems, where the existing bread-and-butter method is inapplicable. △ Less

Submitted 21 March, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 17 pages, 11 figures; move most technical details to appendices to make the essential idea clear

Journal ref: Phys. Rev. Research 3, 023048 (2021)

Showing 1–4 of 4 results for author: Xu, R G