Workload Intelligence: Punching Holes Through the Cloud Abstraction
Authors:
Lexiang Huang,
Anjaly Parayil,
Jue Zhang,
Xiaoting Qin,
Chetan Bansal,
Jovan Stojkovic,
Pantea Zardoshti,
Pulkit Misra,
Eli Cortez,
Raphael Ghelman,
Íñigo Goiri,
Saravan Rajmohan,
Jim Kleewein,
Rodrigo Fonseca,
Timothy Zhu,
Ricardo Bianchini
Abstract:
Today, cloud workloads are essentially opaque to the cloud platform. Typically, the only information the platform receives is the virtual machine (VM) type and possibly a decoration to the type (e.g., the VM is evictable). Similarly, workloads receive little to no information from the platform; generally, workloads might receive telemetry from their VMs or exceptional signals (e.g., shortly before…
▽ More
Today, cloud workloads are essentially opaque to the cloud platform. Typically, the only information the platform receives is the virtual machine (VM) type and possibly a decoration to the type (e.g., the VM is evictable). Similarly, workloads receive little to no information from the platform; generally, workloads might receive telemetry from their VMs or exceptional signals (e.g., shortly before a VM is evicted). The narrow interface between workloads and platforms has several drawbacks: (1) a surge in VM types and decorations in public cloud platforms complicates customer selection; (2) essential workload characteristics (e.g., low availability requirements, high latency tolerance) are often unspecified, hindering platform customization for optimized resource usage and cost savings; and (3) workloads may be unaware of potential optimizations or lack sufficient time to react to platform events.
In this paper, we propose a framework, called Workload Intelligence (WI), for dynamic bi-directional communication between cloud workloads and cloud platform. Via WI, workloads can programmatically adjust their key characteristics, requirements, and even dynamically adapt behaviors like VM priorities. In the other direction, WI allows the platform to programmatically inform workloads about upcoming events, opportunities for optimization, among other scenarios. Because of WI, the cloud platform can drastically simplify its offerings, reduce its costs without fear of violating any workload requirements, and reduce prices to its customers on average by 48.8%.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
Authors:
Huaicheng Li,
Daniel S. Berger,
Stanko Novakovic,
Lisa Hsu,
Dan Ernst,
Pantea Zardoshti,
Monish Shah,
Samir Rajadnya,
Scott Lee,
Ishwar Agarwal,
Mark D. Hill,
Marcus Fontoura,
Ricardo Bianchini
Abstract:
Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However, pooling is challenging under cloud performance requirements. This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and sig…
▽ More
Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However, pooling is challenging under cloud performance requirements. This paper proposes Pond, the first memory pooling system that both meets cloud performance goals and significantly reduces DRAM cost. Pond builds on the Compute Express Link (CXL) standard for load/store access to pool memory and two key insights. First, our analysis of cloud production traces shows that pooling across 8-16 sockets is enough to achieve most of the benefits. This enables a small-pool design with low access latency. Second, it is possible to create machine learning models that can accurately predict how much local and pool memory to allocate to a virtual machine (VM) to resemble same-NUMA-node memory performance. Our evaluation with 158 workloads shows that Pond reduces DRAM costs by 7% with performance within 1-5% of same-NUMA-node VM allocations.
△ Less
Submitted 21 October, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.