-
Vectorization of Gradient Boosting of Decision Trees Prediction in the CatBoost Library for RISC-V Processors
Authors:
Evgeny Kozinov,
Evgeny Vasiliev,
Andrey Gorshkov,
Valentina Kustikova,
Artem Maklaev,
Valentin Volokitin,
Iosif Meyerov
Abstract:
The emergence and rapid development of the open RISC-V instruction set architecture opens up new horizons on the way to efficient devices, ranging from existing low-power IoT boards to future high-performance servers. The effective use of RISC-V CPUs requires software optimization for the target platform. In this paper, we focus on the RISC-V-specific optimization of the CatBoost library, one of t…
▽ More
The emergence and rapid development of the open RISC-V instruction set architecture opens up new horizons on the way to efficient devices, ranging from existing low-power IoT boards to future high-performance servers. The effective use of RISC-V CPUs requires software optimization for the target platform. In this paper, we focus on the RISC-V-specific optimization of the CatBoost library, one of the widely used implementations of gradient boosting for decision trees. The CatBoost library is deeply optimized for commodity CPUs and GPUs. However, vectorization is required to effectively utilize the resources of RISC-V CPUs with the RVV 0.7.1 vector extension, which cannot be done automatically with a C++ compiler yet. The paper reports on our experience in benchmarking CatBoost on the Lichee Pi 4a, RISC-V-based board, and shows how manual vectorization of computationally intensive loops with intrinsics can speed up the use of decision trees several times, depending on the specific workload. The developed codes are publicly available on GitHub.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Improved vectorization of OpenCV algorithms for RISC-V CPUs
Authors:
V. D. Volokitin,
E. P. Vasiliev,
E. A. Kozinov,
V. D. Kustikova,
A. V. Liniov,
Y. A. Rodimkov,
A. V. Sysoyev,
I. B. Meyerov
Abstract:
The development of an open and free RISC-V architecture is of great interest for a wide range of areas, including high-performance computing and numerical simulation in mathematics, physics, chemistry and other problem domains. In this paper, we discuss the possibilities of accelerating computations on available RISC-V processors by improving the vectorization of several computer vision and machin…
▽ More
The development of an open and free RISC-V architecture is of great interest for a wide range of areas, including high-performance computing and numerical simulation in mathematics, physics, chemistry and other problem domains. In this paper, we discuss the possibilities of accelerating computations on available RISC-V processors by improving the vectorization of several computer vision and machine learning algorithms in the widely used OpenCV library. It is shown that improved vectorization speeds up computations on existing prototypes of RISC-V devices by tens of percent.
△ Less
Submitted 19 September, 2023;
originally announced November 2023.
-
Case Study for Running Memory-Bound Kernels on RISC-V CPUs
Authors:
Valentin Volokitin,
Evgeny Kozinov,
Valentina Kustikova,
Alexey Liniov,
Iosif Meyerov
Abstract:
The emergence of a new, open, and free instruction set architecture, RISC-V, has heralded a new era in microprocessor architectures. Starting with low-power, low-performance prototypes, the RISC-V community has a good chance of moving towards fully functional high-end microprocessors suitable for high-performance computing. Achieving progress in this direction requires comprehensive development of…
▽ More
The emergence of a new, open, and free instruction set architecture, RISC-V, has heralded a new era in microprocessor architectures. Starting with low-power, low-performance prototypes, the RISC-V community has a good chance of moving towards fully functional high-end microprocessors suitable for high-performance computing. Achieving progress in this direction requires comprehensive development of the software environment, namely operating systems, compilers, mathematical libraries, and approaches to performance analysis and optimization. In this paper, we analyze the performance of two available RISC-V devices when executing three memory-bound applications: a widely used STREAM benchmark, an in-place dense matrix transposition algorithm, and a Gaussian Blur algorithm. We show that, compared to x86 and ARM CPUs, RISC-V devices are still expected to be inferior in terms of computation time but are very good in resource utilization. We also demonstrate that well-developed memory optimization techniques for x86 CPUs improve the performance on RISC-V CPUs. Overall, the paper shows the potential of RISC-V as an alternative architecture for high-performance computing.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.