-
In-Storage Domain-Specific Acceleration for Serverless Computing
Authors:
Rohan Mahapatra,
Soroush Ghodrati,
Byung Hoon Ahn,
Sean Kinzer,
Shu-ting Wang,
Hanyang Xu,
Lavanya Karthikeyan,
Hardik Sharma,
Amir Yazdanbakhsh,
Mohammad Alian,
Hadi Esmaeilzadeh
Abstract:
While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration of domain-specific accelerators in the hardware level. Each of these three trends individually provide significant benefits; however, when combined the benefits diminish. Specifically, the pa…
▽ More
While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration of domain-specific accelerators in the hardware level. Each of these three trends individually provide significant benefits; however, when combined the benefits diminish. Specifically, the paper makes the key observation that for serverless functions, the overhead of accessing dissaggregated persistent storage overshadows the gains from accelerators. Therefore, to benefit from all these trends in conjunction, we propose Domain-Specific Computational Storage for Serverless (DSCS-Serverless). This idea contributes a serverless model that leverages a programmable accelerator within computational storage to conjugate the benefits of acceleration and storage disaggregation simultaneously. Our results with eight applications shows that integrating a comparatively small accelerator within the storage (DSCS-Serverless) that fits within its power constrains (15 Watts), significantly outperforms a traditional disaggregated system that utilizes the NVIDIA RTX 2080 Ti GPU (250 Watts). Further, the work highlights that disaggregation, serverless model, and the limited power budget for computation in storage require a different design than the conventional practices of integrating microprocessors and FPGAs. This insight is in contrast with current practices of designing computational storage that are yet to address the challenges associated with the shifts in datacenters. In comparison with two such conventional designs that either use quad-core ARM A57 or a Xilinx FPGA, DSCS-Serverless provides 3.7x and 1.7x end-to-end application speedup, 4.3x and 1.9x energy reduction, and 3.2x and 2.3x higher cost efficiency, respectively.
△ Less
Submitted 23 March, 2024; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Enabling Kernel Bypass Networking on gem5
Authors:
Siddharth Agarwal,
Minwoo Lee,
Ren Wang,
Mohammad Alian
Abstract:
Full-system simulation of computer systems is critical to capture the complex interplay between various hardware and software components in future systems. Modeling the network subsystem is indispensable to the fidelity of the full-system simulation due to the increasing importance of scale-out systems. The network software stack has undergone major changes over the last decade, and kernel-bypass…
▽ More
Full-system simulation of computer systems is critical to capture the complex interplay between various hardware and software components in future systems. Modeling the network subsystem is indispensable to the fidelity of the full-system simulation due to the increasing importance of scale-out systems. The network software stack has undergone major changes over the last decade, and kernel-bypass networking stacks and data-plane networks are rapidly replacing the conventional kernel network stack. Nevertheless, the current state-of-the-art architectural simulators still use kernel networking which precludes realistic network application scenarios. In this work, we enable kernel bypass networking stack on gem5, the state-of-the-art full-system architectural simulator. We extend gem5's NIC hardware model and device driver to enable the support for userspace device drivers to run the DPDK framework. We also implement a network load generator hardware model in gem5 to generate various traffic patterns and perform per-packet timestamp and latency measurements without introducing packet loss. Our experimental results show that DPDK's simulated network bandwidth scales with the number of cores and NIC ports. As two use cases, we analyze the sensitivity of (1) network performance to several microarchitectural parameters, and (2) direct cache access (DCA) technology to DPDK burst size.
△ Less
Submitted 27 January, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
AbGradCon 2021: Lessons in Digital Meetings, International Collaboration, and Interdisciplinarity in Astrobiology
Authors:
Tony Z. Jia,
Kristin N. Johnson-Finn,
Osama M. Alian,
Irene Bonati,
Kosuke Fujishima,
Natalie Grefenstette,
Thilina Heenatigala,
Yamei Li,
Natsumi Noda,
Petar I. Penev,
Paula Prondzinsky,
Harrison B. Smith
Abstract:
The Astrobiology Graduate Conference (AbGradCon) is an annual conference both organized for and by early career researchers, postdoctoral fellows, and students as a way to train the next generation of astrobiologists and develop a robust network of cohorts moving forward. AbGradCon 2021 was held virtually on September 14-17, 2021, hosted by the Earth-Life Science Institute (ELSI) of Tokyo Institut…
▽ More
The Astrobiology Graduate Conference (AbGradCon) is an annual conference both organized for and by early career researchers, postdoctoral fellows, and students as a way to train the next generation of astrobiologists and develop a robust network of cohorts moving forward. AbGradCon 2021 was held virtually on September 14-17, 2021, hosted by the Earth-Life Science Institute (ELSI) of Tokyo Institute of Technology after postponement of the in-person event in 2020 due to the COVID-19 pandemic. The meeting consisted of presentations by 120 participants from a variety of fields, two keynote speakers, and other career building events and workshops. Here, we report on the organizational and executional aspects of AbGradCon 2021, including the meeting participant demographics, various digital aspects introduced specifically for a virtual edition of the meeting, and the abstract submission and evaluation process. The abstract evaluation process of AbGradCon 2021 is unique in that all evaluations are done by the peers of the applicants, and as astrobiology is inherently a broad discipline, the abstract evaluation process revealed a number of trends related to multidisciplinarity of the astrobiology field. We believe that meetings like AbGradCon can provide a unique opportunity for students and early career researchers in astrobiology to experience community building, inter- and multidisciplinary collaboration, and career training and would be a welcome sight in other fields as well. We hope that this report provides inspiration and a basic roadmap for organizing future conferences in any field with similar goals.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform
Authors:
Yifan Yuan,
Mohammad Alian,
Yipeng Wang,
Ilia Kurakin,
Ren Wang,
Charlie Tai,
Nam Sung Kim
Abstract:
In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of the workloads, and how to manage LLC is a key to the performance isolation and QoS in the cloud with multi-tenancy. In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management. This is because of an Intel architectural…
▽ More
In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of the workloads, and how to manage LLC is a key to the performance isolation and QoS in the cloud with multi-tenancy. In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management. This is because of an Intel architectural innovation -- Data Direct I/O (DDIO) -- that directly injects the inbound I/O traffic to (part of) the LLC instead of the main memory. We summarize two problems caused by DDIO and show that (1) the default DDIO configuration may not always achieve optimal performance, (2) DDIO can decrease the performance of non-I/O workloads which share LLC with it by as high as 32%.
We then present IOCA, the first LLC management mechanism for network-centric platforms that treats the I/O as the first-class citizen. IOCA monitors and analyzes the performance of the cores, LLC, and DDIO using CPU's hardware performance counters, and adaptively adjusts the number of LLC ways for DDIO or the tenants that demand more LLC capacity. In addition, IOCA dynamically chooses the tenants that share its LLC resource with DDIO, to minimize the performance interference by both the tenants and the I/O. Our experiments with multiple microbenchmarks and real-world applications in two major end-host network models demonstrate that IOCA can effectively reduce the performance degradation caused by DDIO, with minimal overhead.
△ Less
Submitted 4 March, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
The gem5 Simulator: Version 20.0+
Authors:
Jason Lowe-Power,
Abdul Mutaal Ahmad,
Ayaz Akram,
Mohammad Alian,
Rico Amslinger,
Matteo Andreozzi,
AdriĆ Armejach,
Nils Asmussen,
Brad Beckmann,
Srikant Bharadwaj,
Gabe Black,
Gedare Bloom,
Bobby R. Bruce,
Daniel Rodrigues Carvalho,
Jeronimo Castrillon,
Lizhong Chen,
Nicolas Derumigny,
Stephan Diestelhorst,
Wendy Elsasser,
Carlos Escuin,
Marjan Fariborz,
Amin Farmahini-Farahani,
Pouya Fotouhi,
Ryan Gambord,
Jayneel Gandhi
, et al. (53 additional authors not shown)
Abstract:
The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 si…
▽ More
The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.
△ Less
Submitted 29 September, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Separation Theorems for Phase-Incoherent Multiple-User Channels
Authors:
Hamidreza Ebrahimzadeh Saffar,
Ehsan Haj Mirza Alian,
Patrick Mitran
Abstract:
We study the transmission of two correlated and memoryless sources $(U,V)$ over several multiple-user phase asynchronous channels. Namely, we consider a class of phase-incoherent multiple access relay channels (MARC) with both non-causal and causal unidirectional cooperation between encoders, referred to as phase-incoherent unidirectional non-causal cooperative MARC (PI-UNCC-MARC), and phase-incoh…
▽ More
We study the transmission of two correlated and memoryless sources $(U,V)$ over several multiple-user phase asynchronous channels. Namely, we consider a class of phase-incoherent multiple access relay channels (MARC) with both non-causal and causal unidirectional cooperation between encoders, referred to as phase-incoherent unidirectional non-causal cooperative MARC (PI-UNCC-MARC), and phase-incoherent unidirectional causal cooperative MARC (PI-UCC-MARC) respectively. We also consider phase-incoherent interference channels (PI-IC), and interference relay channel (PI-IRC) models in the same context. In all cases, the input signals are assumed to undergo non-ergodic phase shifts due to the channel. The shifts are assumed to be unknown to the transmitters and known to the receivers as a realistic assumption. Both necessary and sufficient conditions in order to reliably send the correlated sources to the destinations over the considered channels are derived. In particular, for all of the channel models, we first derive an outer bound for reliable communication that is defined with respect to the source entropy content (i.e., the triple $(H(U|V),H(V|U),H(U,V))$). Then, using {\em separate} source and channel coding, under specific gain conditions, we establish the same region as the inner bound and therefore obtain tight conditions for reliable communication for the specific channel under study. We thus establish a source-channel separation theorem for each channel and conclude that without the knowledge of the phase shifts at the transmitter sides, separation is optimal. It is further conjectured that separation in general is optimal for all channel coefficients.
△ Less
Submitted 13 October, 2011;
originally announced October 2011.