-
Towards Distributed Software Resilience in Asynchronous Many-Task Programming Models
Authors:
Nikunj Gupta,
Jackson R. Mayo,
Adrian S. Lemoine,
Hartmut Kaiser
Abstract:
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this paper,…
▽ More
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will likely increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this paper, we discuss software resilience in AMTs at both local and distributed scale. We choose HPX to prototype our resiliency designs. We implement two resiliency APIs that we expose to the application developers, namely task replication and task replay. Task replication repeats a task n-times and executes them asynchronously. Task replay reschedules a task up to n-times until a valid output is returned. Furthermore, we expose algorithm based fault tolerance (ABFT) using user provided predicates (e.g., checksums) to validate the returned results. We benchmark the resiliency scheme for both synthetic and real world applications at local and distributed scale and show that most of the added execution time arises from the replay, replication or data movement of the tasks and not the boilerplate code added to achieve resilience.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
DMR API: Improving cluster productivity by turning applications into malleable
Authors:
Sergio Iserte,
Rafael Mayo,
Enrique S. Quintana-Orti,
Vicenc Beltran,
Antonio J. Peña
Abstract:
Adaptive workloads can change on--the--fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime, to change its number of MPI ranks. The collaboration between both the workload manager---aware of the queue of jobs and the…
▽ More
Adaptive workloads can change on--the--fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime, to change its number of MPI ranks. The collaboration between both the workload manager---aware of the queue of jobs and the resource allocation---and the parallel runtime---able to transparently handle the processes and the program data---is crucial for our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return an action: an expansion, if there are spare resources; a shrink, if queued jobs can be initiated; or none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and how it is capable of reducing the global workload completion time along with providing a smarter usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the detailed behavior of our framework in representative experiments and the low overhead that our reconfiguration involves.
△ Less
Submitted 28 May, 2020; v1 submitted 12 May, 2020;
originally announced May 2020.
-
Implementing Software Resiliency in HPX for Extreme Scale Computing
Authors:
Nikunj Gupta,
Jackson R. Mayo,
Adrian S. Lemoine,
Hartmut Kaiser
Abstract:
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will invariably increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this pa…
▽ More
Exceptions and errors occurring within mission critical applications due to hardware failures have a high cost. With the emerging Next Generation Platforms (NGPs), the rate of hardware failures will invariably increase. Therefore, designing our applications to be resilient is a critical concern in order to retain the reliability of results while meeting the constraints on power budgets. In this paper, we implement software resilience in HPX, an Asynchronous Many-Task Runtime system. We implement two resiliency APIs that we expose to the application developers, namely task replication and task replay. Task replication repeats a task n-times and executes them asynchronously. Task replay will reschedule a task up to n-times until a valid output is returned. Furthermore, we introduce an API that allows the application to verify the returned result with a user provided predicate. We test the APIs with both artificial workloads and a dataflow based stencil application. We demonstrate that only minor overheads are incurred when utilizing these resiliency features for work loads where the task size is greater than 200 $μ$s. We also show that most of the added execution time arises from the replay or replication of the tasks themselves and not by the implementation of the APIs.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors
Authors:
Sandra Catalán,
Francisco D. Igual,
Rafael Mayo,
Luis Piñuel,
Enrique S. Quintana-Ortí,
Rafael Rodríguez-Sánchez
Abstract:
Asymmetric processors have emerged as an appealing technology for severely energy-constrained environments, especially in the mobile market where heterogeneity in applications is mainstream. In addition, given the growing interest on ultra low-power architectures for high performance computing, this type of platforms are also being investigated in the road towards the implementation of energy- eff…
▽ More
Asymmetric processors have emerged as an appealing technology for severely energy-constrained environments, especially in the mobile market where heterogeneity in applications is mainstream. In addition, given the growing interest on ultra low-power architectures for high performance computing, this type of platforms are also being investigated in the road towards the implementation of energy- efficient high-performance scientific applications. In this paper, we propose a first step towards a complete implementation of the BLAS interface adapted to asymmetric ARM big.LITTLE processors, analyzing the trade-offs between performance and energy efficiency when compared to existing homogeneous (symmetric) multi-threaded BLAS implementations. Our experimental results reveal important gains in performance while maintaining the energy efficiency of homogeneous solutions by efficiently exploiting all the resources of the asymmetric processor.
△ Less
Submitted 17 July, 2015;
originally announced July 2015.
-
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Authors:
Sandra Catalán,
Francisco D. Igual,
Rafael Mayo,
Rafael Rodríguez-Sánchez,
Enrique S. Quintana-Ortí
Abstract:
Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt o…
▽ More
Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications.
In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.
△ Less
Submitted 30 June, 2015;
originally announced June 2015.
-
Challenges and characterization of a Biological system on Grid by means of the PhyloGrid application
Authors:
Raul Isea,
Esther Montes,
Antonio J. Rubio-Montero,
Rafael Mayo
Abstract:
In this work we present a new application that is being developed. PhyloGrid is able to perform large-scale phylogenetic calculations as those that have been made for estimating the phylogeny of all the sequences already stored in the public NCBI database. The further analysis has been focused on checking the origin of the HIV-1 disease by means of a huge number of sequences that sum up to 2900 ta…
▽ More
In this work we present a new application that is being developed. PhyloGrid is able to perform large-scale phylogenetic calculations as those that have been made for estimating the phylogeny of all the sequences already stored in the public NCBI database. The further analysis has been focused on checking the origin of the HIV-1 disease by means of a huge number of sequences that sum up to 2900 taxa. Such a study has been able to be done by the implementation of a workflow in Taverna.
△ Less
Submitted 5 December, 2014;
originally announced February 2015.
-
Influence and Dynamic Behavior in Random Boolean Networks
Authors:
C. Seshadhri,
Yevgeniy Vorobeychik,
Jackson R. Mayo,
Robert C. Armstrong,
Joseph R. Ruthruff
Abstract:
We present a rigorous mathematical framework for analyzing dynamics of a broad class of Boolean network models. We use this framework to provide the first formal proof of many of the standard critical transition results in Boolean network analysis, and offer analogous characterizations for novel classes of random Boolean networks. We precisely connect the short-run dynamic behavior of a Boolean ne…
▽ More
We present a rigorous mathematical framework for analyzing dynamics of a broad class of Boolean network models. We use this framework to provide the first formal proof of many of the standard critical transition results in Boolean network analysis, and offer analogous characterizations for novel classes of random Boolean networks. We precisely connect the short-run dynamic behavior of a Boolean network to the average influence of the transfer functions. We show that some of the assumptions traditionally made in the more common mean-field analysis of Boolean networks do not hold in general.
For example, we offer some evidence that imbalance, or expected internal inhomogeneity, of transfer functions is a crucial feature that tends to drive quiescent behavior far more strongly than previously observed.
△ Less
Submitted 19 July, 2011;
originally announced July 2011.
-
Advances in the Biomedical Applications of the EELA Project
Authors:
Vicente Hernández,
Ignacio Blanquer,
Gabriel Aparicio,
Raul Isea,
Juan Luis Chavés,
Álvaro Hernández,
Henry Ricardo Mora,
Manuel Fernández,
Alicia Acero,
Esther Montes,
Rafael Mayo
Abstract:
In the last years an increasing demand for Grid Infrastructures has resulted in several international collaborations. This is the case of the EELA Project, which has brought together collaborating groups of Latin America and Europe. One year ago we presented this e-infrastructure used, among others, by the Biomedical groups for the studies of oncological analysis, neglected diseases, sequence alig…
▽ More
In the last years an increasing demand for Grid Infrastructures has resulted in several international collaborations. This is the case of the EELA Project, which has brought together collaborating groups of Latin America and Europe. One year ago we presented this e-infrastructure used, among others, by the Biomedical groups for the studies of oncological analysis, neglected diseases, sequence alignments and computation phylogenetics. After this period, the achieved advances are summarised in this paper.
△ Less
Submitted 17 December, 2010;
originally announced December 2010.
-
PhyloGrid: a development for a workflow in Phylogeny
Authors:
Esther Montes,
Raul Isea,
Rafael Mayo
Abstract:
In this work we present the development of a workflow based on Taverna which is going to be implemented for calculations in Phylogeny by means of the MrBayes tool. It has a friendly interface developed with the Gridsphere framework. The user is able to define the parameters for doing the Bayesian calculation, determine the model of evolution, check the accuracy of the results in the intermediate s…
▽ More
In this work we present the development of a workflow based on Taverna which is going to be implemented for calculations in Phylogeny by means of the MrBayes tool. It has a friendly interface developed with the Gridsphere framework. The user is able to define the parameters for doing the Bayesian calculation, determine the model of evolution, check the accuracy of the results in the intermediate stages as well as do a multiple alignment of the sequences previously to the final result. To do this, no knowledge from his/her side about the computational procedure is required.
△ Less
Submitted 17 December, 2010;
originally announced December 2010.