Showing 1–2 of 2 results for author: Kuper, R

Search v0.5.6 released 2020-02-24

arXiv:2305.02480 [pdf, other]

cs.AR cs.PF

doi 10.1145/3620665.3640401

A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors

Authors: Reese Kuper, Ipoom Jeong, Yifan Yuan, Jiayu Hu, Ren Wang, Narayan Ranganathan, Nam Sung Kim

Abstract: As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets… ▽ More As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application. △ Less

Submitted 29 January, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: This paper has been accepted by ASPLOS'24. Please refer to the linked DOI for the official version of this paper
arXiv:2303.15375 [pdf, other]

cs.PF cs.AR

doi 10.1145/3613424.3614256

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

Authors: Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, **ghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim

Abstract: The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technolo… ▽ More The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems. △ Less

Submitted 4 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: This paper has been accepted by MICRO'23. Please refer to the https://doi.org/10.1145/3613424.3614256 for the official version of this paper

ACM Class: C.4; D.4; C.0

Search v0.5.6 released 2020-02-24