-
Workflows to driving high-performance interactive supercomputing for urgent decision making
Authors:
Nick Brown,
Rupert Nash,
Gordon Gibb,
Evgenij Belikov,
Artur Podobas,
Wei Der Chien,
Stefano Markidis,
Markus Flatken,
Andreas Gerndt
Abstract:
Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; na…
▽ More
Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner.
In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Predicting batch queue job wait times for informed scheduling of urgent HPC workloads
Authors:
Nick Brown,
Gordon Gibb,
Evgenij Belikov,
Rupert Nash
Abstract:
There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because t…
▽ More
There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times.
For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Utilising urgent computing to tackle the spread of mosquito-borne diseases
Authors:
Nick Brown,
Rupert Nash,
Piero Poletti,
Giorgio Guzzetta,
Mattia Manica,
Agnese Zardini,
Markus Flatken,
Jules Vidal,
Charles Gueunet,
Evgenij Belikov,
Julien Tierny,
Artur Podobas,
Wei Der Chien,
Stefano Markidis,
Andreas Gerndt
Abstract:
It is estimated that around 80\% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causi…
▽ More
It is estimated that around 80\% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causing significant worry to global health organisations, including the CDC and WHO, and-so an important question is the role that technology can play in addressing them.
In this work we describe the integration of an epidemiology model, which simulates the spread of mosquito-borne diseases, with the VESTEC urgent computing ecosystem. The intention of this work is to empower human health professionals to exploit this model and more easily explore the progression of mosquito-borne diseases. Traditionally in the domain of the few research scientists, by leveraging state of the art visualisation and analytics techniques, all supported by running the computational workloads on HPC machines in a seamless fashion, we demonstrate the significant advantages that such an integration can provide. Furthermore we demonstrate the benefits of using an ecosystem such as VESTEC, which provides a framework for urgent computing, in supporting the easy adoption of these technologies by the epidemiologists and disaster response professionals more widely.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
The role of interactive super-computing in using HPC for urgent decision making
Authors:
Nick Brown,
Rupert Nash,
Gordon Gibb,
Bianca Prodan,
Max Kontak,
Vyacheslav Olshevsky,
Wei Der Chien
Abstract:
Technological advances are creating exciting new opportunities that have the potential to move HPC well beyond traditional computational workloads. In this paper we focus on the potential for HPC to be instrumental in responding to disasters such as wildfires, hurricanes, extreme flooding, earthquakes, tsunamis, winter weather conditions, and accidents. Driven by the VESTEC EU funded H2020 project…
▽ More
Technological advances are creating exciting new opportunities that have the potential to move HPC well beyond traditional computational workloads. In this paper we focus on the potential for HPC to be instrumental in responding to disasters such as wildfires, hurricanes, extreme flooding, earthquakes, tsunamis, winter weather conditions, and accidents. Driven by the VESTEC EU funded H2020 project, our research looks to prove HPC as a tool not only capable of simulating disasters once they have happened, but also one which is able to operate in a responsive mode, supporting disaster response teams making urgent decisions in real-time. Whilst this has the potential to revolutionise disaster response, it requires the ability to drive HPC interactively, both from the user's perspective and also based upon the arrival of data. As such interactivity is a critical component in enabling HPC to be exploited in the role of supporting disaster response teams so that urgent decision makers can make the correct decision first time, every time.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing
Authors:
Gordon Gibb,
Rupert Nash,
Nick Brown,
Bianca Prodan
Abstract:
The use of High Performance Computing (HPC) to compliment urgent decision making in the event of disasters is an important future potential use of supercomputers. However, the usage modes involved are rather different from how HPC has been used traditionally. As such, there are many obstacles that need to be overcome, not least the unbounded wait times in the batch system queues, to make the use o…
▽ More
The use of High Performance Computing (HPC) to compliment urgent decision making in the event of disasters is an important future potential use of supercomputers. However, the usage modes involved are rather different from how HPC has been used traditionally. As such, there are many obstacles that need to be overcome, not least the unbounded wait times in the batch system queues, to make the use of HPC in disaster response practical. In this paper, we present how the VESTEC project plans to overcome these issues and develop a working prototype of an urgent computing control system. We describe the requirements for such a system and analyse the different technologies available that can be leveraged to successfully build such a system. We finally explore the design of the VESTEC system and discuss ongoing challenges that need to be addressed to realise a production level system.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Supercomputing with MPI meets the Common Workflow Language standards: an experience report
Authors:
Rupert W. Nash,
Nick Brown,
Michael R. Crusoe,
Max Kontak
Abstract:
Use of standards-based workflows is still somewhat unusual by high-performance computing users. In this paper we describe the experience of using the Common Workflow Language (CWL) standards to describe the execution, in parallel, of MPI-parallelised applications. In particular, we motivate and describe the simple extension to the specification which was required, as well as our implementation of…
▽ More
Use of standards-based workflows is still somewhat unusual by high-performance computing users. In this paper we describe the experience of using the Common Workflow Language (CWL) standards to describe the execution, in parallel, of MPI-parallelised applications. In particular, we motivate and describe the simple extension to the specification which was required, as well as our implementation of this within the CWL reference runner. We discuss some of the unexpected benefits, such as simple use of HPC-oriented performance measurement tools, and CWL software requirements interfacing with HPC module systems. We close with a request for comment from the community on how these features could be adopted within versions of the CWL standards.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
An Introduction to Convolutional Neural Networks
Authors:
Keiron O'Shea,
Ryan Nash
Abstract:
The field of machine learning has taken a dramatic twist in recent times, with the rise of the Artificial Neural Network (ANN). These biologically inspired computational models are able to far exceed the performance of previous forms of artificial intelligence in common machine learning tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). C…
▽ More
The field of machine learning has taken a dramatic twist in recent times, with the rise of the Artificial Neural Network (ANN). These biologically inspired computational models are able to far exceed the performance of previous forms of artificial intelligence in common machine learning tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). CNNs are primarily used to solve difficult image-driven pattern recognition tasks and with their precise yet simple architecture, offers a simplified method of getting started with ANNs.
This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in develo** these brilliantly fantastic image recognition models. This introduction assumes you are familiar with the fundamentals of ANNs and machine learning.
△ Less
Submitted 2 December, 2015; v1 submitted 26 November, 2015;
originally announced November 2015.
-
Weighted decomposition in high-performance lattice-Boltzmann simulations: are some lattice sites more equal than others?
Authors:
Derek Groen,
David Abou Chacra,
Rupert W. Nash,
Jiri Jaros,
Miguel O. Bernabeu,
Peter V. Coveney
Abstract:
Obtaining a good load balance is a significant challenge in scaling up lattice-Boltzmann simulations of realistic sparse problems to the exascale. Here we analyze the effect of weighted decomposition on the performance of the HemeLB lattice-Boltzmann simulation environment, when applied to sparse domains. Prior to domain decomposition, we assign wall and in/outlet sites with increased weights whic…
▽ More
Obtaining a good load balance is a significant challenge in scaling up lattice-Boltzmann simulations of realistic sparse problems to the exascale. Here we analyze the effect of weighted decomposition on the performance of the HemeLB lattice-Boltzmann simulation environment, when applied to sparse domains. Prior to domain decomposition, we assign wall and in/outlet sites with increased weights which reflect their increased computational cost. We combine our weighted decomposition with a second optimization, which is to sort the lattice sites according to a space filling curve. We tested these strategies on a sparse bifurcation and very sparse aneurysm geometry, and find that using weights reduces calculation load imbalance by up to 85%, although the overall communication overhead is higher than some of our runs.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
Computer simulations reveal complex distribution of haemodynamic forces in a mouse retina model of angiogenesis
Authors:
Miguel O. Bernabeu,
Martin Jones,
Jens H. Nielsen,
Timm Krüger,
Rupert W. Nash,
Derek Groen,
Sebastian Schmieschek,
James Hetherington,
Holger Gerhardt,
Claudio A. Franco,
Peter V. Coveney
Abstract:
There is currently limited understanding of the role played by haemodynamic forces on the processes governing vascular development. One of many obstacles to be overcome is being able to measure those forces, at the required resolution level, on vessels only a few micrometres thick. In the current paper, we present an in silico method for the computation of the haemodynamic forces experienced by mu…
▽ More
There is currently limited understanding of the role played by haemodynamic forces on the processes governing vascular development. One of many obstacles to be overcome is being able to measure those forces, at the required resolution level, on vessels only a few micrometres thick. In the current paper, we present an in silico method for the computation of the haemodynamic forces experienced by murine retinal vasculature (a widely used vascular development animal model) beyond what is measurable experimentally. Our results show that it is possible to reconstruct high-resolution three-dimensional geometrical models directly from samples of retinal vasculature and that the lattice-Boltzmann algorithm can be used to obtain accurate estimates of the haemodynamics in these domains. We generate flow models from samples obtained at postnatal days (P) 5 and 6. Our simulations show important differences between the flow patterns recovered in both cases, including observations of regression occurring in areas where wall shear stress gradients exist. We propose two possible mechanisms to account for the observed increase in velocity and wall shear stress between P5 and P6: i) the measured reduction in typical vessel diameter between both time points, ii) the reduction in network density triggered by the pruning process. The methodology developed herein is applicable to other biomedical domains where microvasculature can be imaged but experimental flow measurements are unavailable or difficult to obtain.
△ Less
Submitted 15 July, 2014; v1 submitted 7 November, 2013;
originally announced November 2013.
-
Impact of blood rheology on wall shear stress in a model of the middle cerebral artery
Authors:
Miguel O. Bernabeu,
Rupert W. Nash,
Derek Groen,
Hywel B. Carver,
James Hetherington,
Timm Krüger,
Peter V. Coveney
Abstract:
Perturbations to the homeostatic distribution of mechanical forces exerted by blood on the endothelial layer have been correlated with vascular pathologies including intracranial aneurysms and atherosclerosis. Recent computational work suggests that in order to correctly characterise such forces, the shear-thinning properties of blood must be taken into account. To the best of our knowledge, these…
▽ More
Perturbations to the homeostatic distribution of mechanical forces exerted by blood on the endothelial layer have been correlated with vascular pathologies including intracranial aneurysms and atherosclerosis. Recent computational work suggests that in order to correctly characterise such forces, the shear-thinning properties of blood must be taken into account. To the best of our knowledge, these findings have never been compared against experimentally observed pathological thresholds. In the current work, we apply the three-band diagram (TBD) analysis due to Gizzi et al. to assess the impact of the choice of blood rheology model on a computational model of the right middle cerebral artery. Our results show that, in the model under study, the differences between the wall shear stress predicted by a Newtonian model and the well known Carreau-Yasuda generalized Newtonian model are only significant if the vascular pathology under study is associated with a pathological threshold in the range 0.94 Pa to 1.56 Pa, where the results of the TBD analysis of the rheology models considered differs. Otherwise, we observe no significant differences.
△ Less
Submitted 15 July, 2014; v1 submitted 22 November, 2012;
originally announced November 2012.
-
Flexible composition and execution of high performance, high fidelity multiscale biomedical simulations
Authors:
Derek Groen,
Joris Borgdorff,
Carles Bona-Casas,
James Hetherington,
Rupert W. Nash,
Stefan J. Zasada,
Ilya Saverchenko,
Mariusz Mamonski,
Krzysztof Kurowski,
Miguel O. Bernabeu,
Alfons G. Hoekstra,
Peter V. Coveney
Abstract:
Multiscale simulations are essential in the biomedical domain to accurately model human physiology. We present a modular approach for designing, constructing and executing multiscale simulations on a wide range of resources, from desktops to petascale supercomputers, including combinations of these. Our work features two multiscale applications, in-stent restenosis and cerebrovascular bloodflow, w…
▽ More
Multiscale simulations are essential in the biomedical domain to accurately model human physiology. We present a modular approach for designing, constructing and executing multiscale simulations on a wide range of resources, from desktops to petascale supercomputers, including combinations of these. Our work features two multiscale applications, in-stent restenosis and cerebrovascular bloodflow, which combine multiple existing single-scale applications to create a multiscale simulation. These applications can be efficiently coupled, deployed and executed on computers up to the largest (peta) scale, incurring a coupling overhead of 1 to 10% of the total execution time.
△ Less
Submitted 28 January, 2013; v1 submitted 13 November, 2012;
originally announced November 2012.
-
Coalesced communication: a design pattern for complex parallel scientific software
Authors:
Hywel B. Carver,
Derek Groen,
James Hetherington,
Rupert W. Nash,
Miguel O. Bernabeu,
Peter V. Coveney
Abstract:
We present a new design pattern for high-performance parallel scientific software, named coalesced communication. This pattern allows for a structured way to improve the communication performance through coalescence of multiple communication needs using two communication management components. We apply the design pattern to several simulations of a lattice-Boltzmann blood flow solver with streamin…
▽ More
We present a new design pattern for high-performance parallel scientific software, named coalesced communication. This pattern allows for a structured way to improve the communication performance through coalescence of multiple communication needs using two communication management components. We apply the design pattern to several simulations of a lattice-Boltzmann blood flow solver with streaming visualisation which engenders a reduction in the communication overhead of approximately 40%.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.