-
A Two-Level Thermal Cycling-aware Task Map** Technique for Reliability Management in Manycore Systems
Authors:
Fatemeh Hossein Khani,
Omid Akbari,
Muhammad Shafique
Abstract:
Reliability management is one of the primary concerns in manycore systems design. Different aging mechanisms such as Negative-Bias Temperature Instability (NBTI), Electromigration (EM), and thermal cycling can reduce the reliability of these systems. However, state-of-the-art works mainly focused on NBTI and EM, whereas a few works have considered the thermal cycling effect. The thermal cycling ef…
▽ More
Reliability management is one of the primary concerns in manycore systems design. Different aging mechanisms such as Negative-Bias Temperature Instability (NBTI), Electromigration (EM), and thermal cycling can reduce the reliability of these systems. However, state-of-the-art works mainly focused on NBTI and EM, whereas a few works have considered the thermal cycling effect. The thermal cycling effect can significantly aggravate the systems lifetime. Moreover, the thermal effects of cores on each other due to their adjacency may also influence the systems Mean Time to Failure (MTTF). This paper introduces a new technique to manage the reliability of manycore systems. The technique considers thermal cycling, adjacency of cores, and process variation-induced diversity of operating frequencies. It uses two levels of task map** to improve system lifetime. At the first level, cores with close temperatures are packed into the same bin, and then, an arrived task is assigned to a bin with a similar temperature. Afterward in the second level, the task is assigned to a core inside the selected bin in the first level, based on performance requirements and the core frequency. Compared to the conventional TC-aware techniques, the proposed method is performed at a higher level (bins level) to reduce the thermal variations of cores inside a bin, and improves the system MTTFTC, making it a promising solution for manycore systems. The efficacy of our proposed technique is evaluated on 16, 32, 64, and 256 core systems using SPLASH2 and PARSEC benchmark suite applications. The results show up to 20% MTTFTC increment compared to the conventional thermal cycling-aware task map** techniques.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
X-Rel: Energy-Efficient and Low-Overhead Approximate Reliability Framework for Error-Tolerant Applications Deployed in Critical Systems
Authors:
Jafar Vafaei,
Omid Akbari,
Muhammad Shafique,
Christian Hochberger
Abstract:
Triple Modular Redundancy (TMR) is one of the most common techniques in fault-tolerant systems, in which the output is determined by a majority voter. However, the design diversity of replicated modules and/or soft errors that are more likely to happen in the nanoscale era may affect the majority voting scheme. Besides, the significant overheads of the TMR scheme may limit its usage in energy cons…
▽ More
Triple Modular Redundancy (TMR) is one of the most common techniques in fault-tolerant systems, in which the output is determined by a majority voter. However, the design diversity of replicated modules and/or soft errors that are more likely to happen in the nanoscale era may affect the majority voting scheme. Besides, the significant overheads of the TMR scheme may limit its usage in energy consumption and area-constrained critical systems. However, for most inherently error-resilient applications such as image processing and vision deployed in critical systems (like autonomous vehicles and robotics), achieving a given level of reliability has more priority than precise results. Therefore, these applications can benefit from the approximate computing paradigm to achieve higher energy efficiency and a lower area. This paper proposes an energy-efficient approximate reliability (X-Rel) framework to overcome the aforementioned challenges of the TMR systems and get the full potential of approximate computing without sacrificing the desired reliability constraint and output quality. The X-Rel framework relies on relaxing the precision of the voter based on a systematical error bounding method that leverages user-defined quality and reliability constraints. Afterward, the size of the achieved voter is used to approximate the TMR modules such that the overall area and energy consumption are minimized. The effectiveness of employing the proposed X-Rel technique in a TMR structure, for different quality constraints as well as with various reliability bounds are evaluated in a 15-nm FinFET technology. The results of the X-Rel voter show delay, area, and energy consumption reductions of up to 86%, 87%, and 98%, respectively, when compared to those of the state-of-the-art approximate TMR voters.
△ Less
Submitted 4 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
An Energy-Efficient Generic Accuracy Configurable Multiplier Based on Block-Level Voltage Overscaling
Authors:
Ali Akbar Bahoo,
Omid Akbari,
Muhammad Shafique
Abstract:
Voltage Overscaling (VOS) is one of the well-known techniques to increase the energy efficiency of arithmetic units. Also, it can provide significant lifetime improvements, while still meeting the accuracy requirements of inherently error-resilient applications. This paper proposes a generic accuracy-configurable multiplier that employs the VOS at a coarse-grained level (block-level) to reduce the…
▽ More
Voltage Overscaling (VOS) is one of the well-known techniques to increase the energy efficiency of arithmetic units. Also, it can provide significant lifetime improvements, while still meeting the accuracy requirements of inherently error-resilient applications. This paper proposes a generic accuracy-configurable multiplier that employs the VOS at a coarse-grained level (block-level) to reduce the control logic required for applying VOS and its associated overheads, thus enabling a high degree of trade-off between energy consumption and output quality. The proposed configurable Block-Level VOS-based (BL-VOS) multiplier relies on employing VOS in a multiplier composed of smaller blocks, where applying VOS in different blocks results in structures with various output accuracy levels. To evaluate the proposed concept, we implement 8-bit and 16-bit BL-VOS multipliers with various blocks width in a 15-nm FinFET technology. The results show that the proposed multiplier achieves up to 15% lower energy consumption and up to 21% higher output accuracy compared to the state-of-the-art VOS-based multipliers. Also, the effects of Process Variation (PV) and Bias Temperature Instability (BTI) induced delay on the proposed multiplier are investigated. Finally, the effectiveness of the proposed multiplier is studied for two different image processing applications, in terms of quality and energy efficiency.
△ Less
Submitted 4 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.