-
INTERPLAY: An Intelligent Model for Predicting Performance Degradation due to Multi-cache Way-disabling
Authors:
Panagiota Nikolaou,
Yiannakis Sazeides,
Maria K. Michael
Abstract:
Modern and future processors need to remain functionally correct in the presence of permanent faults to sustain scaling benefits and limit field returns. This paper presents a combined analytical and microarchitectural simulation-based framework called INTERPLAY, that can rapidly predict, at design-time, the performance degradation expected from a processor employing way-disabling to handle perman…
▽ More
Modern and future processors need to remain functionally correct in the presence of permanent faults to sustain scaling benefits and limit field returns. This paper presents a combined analytical and microarchitectural simulation-based framework called INTERPLAY, that can rapidly predict, at design-time, the performance degradation expected from a processor employing way-disabling to handle permanent faults in caches while in-the-field. The proposed model can predict a program's performance with an accuracy of up to 98.40% for a processor with a two-level cache hierarchy, when multiple caches suffer from faults and need to disable one or more of their ways. INTERPLAY is 9.2x faster than an exhaustive simulation approach since it only needs the training simulation runs for the single-cache way-disabling configurations to predict the performance for any multi-cache way-disabling configuration.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers
Authors:
Georgia Antoniou,
Haris Volos,
Davide B. Bartolini,
Tom Rollet,
Yiannakis Sazeides,
Jawad Haj Yahya
Abstract:
This paper presents the design of AgilePkgC (APC): a new C-state architecture that improves the energy proportionality of servers that operate at low utilization while running microservices of user-facing applications. APC targets the reduction of power when all cores are idle in a shallow C-state, ready to transition back to service. In particular, APC targets the power of the resources shared by…
▽ More
This paper presents the design of AgilePkgC (APC): a new C-state architecture that improves the energy proportionality of servers that operate at low utilization while running microservices of user-facing applications. APC targets the reduction of power when all cores are idle in a shallow C-state, ready to transition back to service. In particular, APC targets the power of the resources shared by the cores (e.g., LLC, network-on-chip, IOs, DRAM) which remain active while no core is active to use them. APC realizes its objective by using low-overhead hardware to facilitate sub-microsecond entry/exit latency to a new package C-state and judiciously selecting intermediate power modes for the different shared resources that offer fast transition and, yet, substantial power savings. Our experimental evaluation supports that APC holds the potential to reduce server power by up to 41% with a worst-case performance degradation of less than 0.1% for several representative workloads. Our results clearly support the research and development and eventual adoption of new deep and fast package C-states, like APC, for future server CPUs targeting datacenters running microservices.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications
Authors:
Jawad Haj Yahya,
Haris Volos,
Davide B. Bartolini,
Georgia Antoniou,
Jeremie S. Kim,
Zhe Wang,
Kleovoulos Kalaitzidis,
Tom Rollet,
Zhirui Chen,
Ye Geng,
Onur Mutlu,
Yiannakis Sazeides
Abstract:
User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements. These characteristics render ineffective existing energy conserving techniques when processors are idle due to the long transition time from a deep idle power state (C-state). While prior works propose management techniques to…
▽ More
User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements. These characteristics render ineffective existing energy conserving techniques when processors are idle due to the long transition time from a deep idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep C-state architecture optimized for datacenter server processors targeting latency-sensitive applications. AW is based on three key ideas. First, AW eliminates the latency overhead of saving/restoring the core context (i.e., micro-architectural state) when powering-off/-on the core in a deep idle power state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) retaining context in the power-ungated domain. Second, AW eliminates the flush latency overhead (several tens of microseconds) of the L1/L2 caches when entering a deep idle power state by kee** L1/L2 cache content power-ungated. A minimal control logic also remains power-ungated to serve cache coherence traffic (i.e., snoops) seamlessly. AW implements sleep-mode in caches to reduce caches leakage power consumption and lowers a core voltage to the minimum operational voltage level to minimize the leakage power of the power-ungated domain. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, further cutting precious microseconds of wake-up latency at a negligible power cost. Our evaluation with an accurate simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumption of Memcached by up to 71% (35% on average) with up to 1% performance degradation.
△ Less
Submitted 4 October, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors
Authors:
Jawad Haj Yahya,
Jeremie S. Kim,
A. Giray Yaglikci,
Jisung Park,
Efraim Rotem,
Yanos Sazeides,
Onur Mutlu
Abstract:
To reduce the leakage power of inactive (dark) silicon components, modern processor systems shut-off these components' power supply using low-leakage transistors, called power-gates. Unfortunately, power-gates increase the system's power-delivery impedance and voltage guardband, limiting the system's maximum attainable voltage (i.e., Vmax) and, thus, the CPU core's maximum attainable frequency (i.…
▽ More
To reduce the leakage power of inactive (dark) silicon components, modern processor systems shut-off these components' power supply using low-leakage transistors, called power-gates. Unfortunately, power-gates increase the system's power-delivery impedance and voltage guardband, limiting the system's maximum attainable voltage (i.e., Vmax) and, thus, the CPU core's maximum attainable frequency (i.e., Fmax). As a result, systems that are performance constrained by the CPU frequency (i.e., Fmax-constrained), such as high-end desktops, suffer significant performance loss due to power-gates.
To mitigate this performance loss, we propose DarkGates, a hybrid system architecture that increases the performance of Fmax-constrained systems while fulfilling their power efficiency requirements. DarkGates is based on three key techniques: i) bypassing on-chip power-gates using package-level resources (called bypass mode), ii) extending power management firmware to support operation either in bypass mode or normal mode, and iii) introducing deeper idle power states.
We implement DarkGates on an Intel Skylake microprocessor for client devices and evaluate it using a wide variety of workloads. On a real 4-core Skylake system with integrated graphics, DarkGates improves the average performance of SPEC CPU2006 workloads across all thermal design power (TDP) levels (35W-91W) between 4.2% and 5.3%. DarkGates maintains the performance of 3DMark workloads for desktop systems with TDP greater than 45W while for a 35W-TDP (the lowest TDP) desktop it experiences only a 2% degradation. In addition, DarkGates fulfills the requirements of the ENERGY STAR and the Intel Ready Mode energy efficiency benchmarks of desktop systems.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
A Methodology for Oracle Selection of Monitors and Knobs for Configuring an HPC System running a Flood Management Application
Authors:
Panagiota Nikolaou,
Yiannakis Sazeides,
Antoni Portero,
Radim Vavrik,
Vit Vondrak
Abstract:
This paper defines a methodology for the oracle selection of the monitors and knobs to use to configure an HPC system running a scientific application while satisfying the application's requirements and not violating any system constraints. This methodology relies on a heuristic correlation analysis between requirements, monitors and knobs to determine the minimum subset of monitors to observe and…
▽ More
This paper defines a methodology for the oracle selection of the monitors and knobs to use to configure an HPC system running a scientific application while satisfying the application's requirements and not violating any system constraints. This methodology relies on a heuristic correlation analysis between requirements, monitors and knobs to determine the minimum subset of monitors to observe and knobs to explore, to determine the optimal system configuration for the HPC application. At the end of this analysis, we reduce an 11-dimensional space to a 3-dimensional space for monitors and a 6-dimensional space to a 3-dimensional space for knobs. This reduction shows the potential and highlights the need for a realistic methodology to help identify such minimum set of monitors and knobs.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.