Skip to main content

Showing 1–2 of 2 results for author: Bauinger, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17607  [pdf, other

    cs.AI

    Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs

    Authors: Kai Yuan, Christoph Bauinger, Xiangyi Zhang, Pascal Baehr, Matthias Kirchhart, Darius Dabert, Adrien Tousnakhoff, Pierre Boudier, Michael Paulitsch

    Abstract: This paper presents a SYCL implementation of Multi-Layer Perceptrons (MLPs), which targets and is optimized for the Intel Data Center GPU Max 1550. To increase the performance, our implementation minimizes the slow global memory accesses by maximizing the data reuse within the general register file and the shared local memory by fusing the operations in each layer of the MLP. We show with a simple… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  2. arXiv:2311.00368  [pdf, other

    cs.LG cs.MS

    Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

    Authors: Mohammad Zubair, Christoph Bauinger

    Abstract: In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI's Explicit… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 20 pages, 1 Table, 19 Figures, preprint

    MSC Class: 68-04 (Primary) 68T07; 68W10 (Secondary) ACM Class: I.2.5; G.4