Search | arXiv e-print repository

Improving computation efficiency using input and architecture features for a virtual screening application

Authors: Gianmarco Accordi, Emanuele Vitali, Davide Gadioli, Luigi Crisci, Biagio Cosenza, Mauro Bisson, Massimiliano Fatica, Andrea Beccari, Gianluca Palermo

Abstract: Virtual screening is an early stage of the drug discovery process that selects the most promising candidates. In the urgent computing scenario it is critical to find a solution in a short time frame. In this paper, we focus on a real-world virtual screening application to evaluate out-of-kernel optimizations, that consider input and architecture features to improve the computation efficiency on GP… ▽ More Virtual screening is an early stage of the drug discovery process that selects the most promising candidates. In the urgent computing scenario it is critical to find a solution in a short time frame. In this paper, we focus on a real-world virtual screening application to evaluate out-of-kernel optimizations, that consider input and architecture features to improve the computation efficiency on GPU. Experiment results on a modern supercomputer node show that we can almost double the performance. Moreover, we implemented the optimization using SYCL and it provides a consistent benefit with the CUDA optimization. A virtual screening campaign can use this gain in performance to increase the number of evaluated candidates, improving the probability of finding a drug. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2302.05391 [pdf, ps, other]

Israel coordinates for all static spherically symmetric spacetimes with vanishing second Ricci invariant

Authors: Yannick M. Bisson, Kayll Lake

Abstract: Static spherically symmetric spacetimes with vanishing second Ricci invariant constitute an important class of solutions to Einstein's equations and more generally as archetypes of regular black holes. When studying completeness one is most often presented with the Kruskal - Szekeres procedure. However, this procedure only works if the spacetime admits a single non-degenerate Killing horizon (a si… ▽ More Static spherically symmetric spacetimes with vanishing second Ricci invariant constitute an important class of solutions to Einstein's equations and more generally as archetypes of regular black holes. When studying completeness one is most often presented with the Kruskal - Szekeres procedure. However, this procedure only works if the spacetime admits a single non-degenerate Killing horizon (a single bifurcation two-sphere). Here we generalize the Israel procedure to examine a constructive approach to completeness based entirely on the static spherically symmetric nature of spacetimes with a vanishing second Ricci invariant. It is shown by "block gluing" that the Israel procedure can cover two bifurcation two-spheres, but can fail with three. No coordinate transformations are used in this work. △ Less

Submitted 17 October, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: Final form to appear in Physical Review D

Journal ref: Phys. Rev. D 108, 104017 (2023)

arXiv:2209.05069 [pdf, other]

GPU-optimized Approaches to Molecular Docking-based Virtual Screening in Drug Discovery: A Comparative Analysis

Authors: Emanuele Vitali, Federico Ficarelli, Mauro Bisson, Davide Gadioli, Massimiliano Fatica, Andrea R. Beccari, Gianluca Palermo

Abstract: COVID-19 has shown the importance of having a fast response against pandemics. Finding a novel drug is a very long and complex procedure, and it is possible to accelerate the preliminary phases by using computer simulations. In particular, virtual screening is an in-silico phase that is needed to filter a large set of possible drug candidates to a manageable number. This paper presents the impleme… ▽ More COVID-19 has shown the importance of having a fast response against pandemics. Finding a novel drug is a very long and complex procedure, and it is possible to accelerate the preliminary phases by using computer simulations. In particular, virtual screening is an in-silico phase that is needed to filter a large set of possible drug candidates to a manageable number. This paper presents the implementations and a comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. The first adopts a traditional approach that spreads the computation required to evaluate a single molecule across the entire GPU. The second uses a batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel, without considering the latency to process a single molecule. The paper describes the advantages and disadvantages of the proposed solutions, highlighting implementation details that impact the performance. Experimental results highlight the different performance of the two methods on several target molecule databases while running on NVIDIA A100 GPUs. The two implementations have a strong dependency with respect to the data to be processed. For both cases, the performance is improving while reducing the dimension of the target molecules (number of atoms and rotatable bonds). The two methods demonstrated a different behavior with respect to the size of the molecule database to be screened. While the latency one reaches sooner (with fewer molecules) the performance plateau in terms of throughput, the batched one requires a larger set of molecules. However, the performances after the initial transient period are much higher (up to 5x speed-up). Finally, to check the efficiency of both implementations we deeply analyzed their workload characteristics using the instruction roof-line methodology. △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2102.09510 [pdf, other]

doi 10.1209/0295-5075/133/60005

How we are leading a 3-XORSAT challenge: from the energy landscape to the algorithm and its efficient implementation on GPUs

Authors: M. Bernaschi, M. Bisson, M. Fatica, E. Marinari, V. Martin-Mayor, G. Parisi, F. Ricci-Tersenghi

Abstract: A recent 3-XORSAT challenge required to minimize a very complex and rough energy function, typical of glassy models with a random first order transition and a golf course like energy landscape. We present the ideas beyond the quasi-greedy algorithm and its very efficient implementation on GPUs that are allowing us to rank first in such a competition. We suggest a better protocol to compare algorit… ▽ More A recent 3-XORSAT challenge required to minimize a very complex and rough energy function, typical of glassy models with a random first order transition and a golf course like energy landscape. We present the ideas beyond the quasi-greedy algorithm and its very efficient implementation on GPUs that are allowing us to rank first in such a competition. We suggest a better protocol to compare algorithmic performances and we also provide analytical predictions about the exponential growth of the times to find the solution in terms of free-energy barriers. △ Less

Submitted 24 February, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 7 pages, 7 figure, EPL format + SM (2 pages)

Journal ref: EPL, 133 (2021) 60005

arXiv:1906.06297 [pdf, other]

doi 10.1016/j.cpc.2020.107473

A Performance Study of the 2D Ising Model on GPUs

Authors: Joshua Romero, Mauro Bisson, Massimiliano Fatica, Massimo Bernaschi

Abstract: The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities allowed us to quickly experiment with several implementation ideas: a simple stencil-based algorithm, recasting the stencil operations into matrix multiplies to t… ▽ More The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities allowed us to quickly experiment with several implementation ideas: a simple stencil-based algorithm, recasting the stencil operations into matrix multiplies to take advantage of Tensor Cores available on NVIDIA GPUs, and a highly optimized multi-spin coding approach. Using the managed memory API available in CUDA allows for simple and efficient distribution of these implementations across a multi-GPU NVIDIA DGX-2 server. We show that even a basic GPU implementation can outperform current results published on TPUs and that the optimized multi-GPU implementation can simulate very large lattices faster than custom FPGA solutions. △ Less

Submitted 14 June, 2019; originally announced June 2019.

arXiv:1408.1605 [pdf, other]

Parallel Distributed Breadth First Search on the Kepler Architecture

Authors: Mauro Bisson, Massimo Bernaschi, Enrico Mastrostefano

Abstract: We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a combination of techniques to reduce both the number of communications among the GPUs and the amount of exchanged data. The final result is a code that can v… ▽ More We present the results obtained by using an evolution of our CUDA-based solution for the exploration, via a Breadth First Search, of large graphs. This latest version exploits at its best the features of the Kepler architecture and relies on a combination of techniques to reduce both the number of communications among the GPUs and the amount of exchanged data. The final result is a code that can visit more than 800 billion edges in a second by using a cluster equipped with 4096 Tesla K20X GPUs. △ Less

Submitted 23 December, 2014; v1 submitted 7 August, 2014; originally announced August 2014.

Comments: In this revision we adopt a technique to reduce the size of exchanged messages that relies on the use of a bitmap. This change halves, by itself, the total execution time. Now the code reaches 800 GTEPS on 4096 Kepler GPUs. We also made some modifications to the Introduction and to the performance section. Added new references

arXiv:1307.8276 [pdf, other]

GPU peer-to-peer techniques applied to a cluster interconnect

Authors: Roberto Ammendola, Massimo Bernaschi, Andrea Biagioni, Mauro Bisson, Massimiliano Fatica, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Enrico Mastrostefano, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

Abstract: Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement pee… ▽ More Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method. △ Less

Submitted 31 July, 2013; originally announced July 2013.

Comments: paper accepted to CASS 2013

Showing 1–7 of 7 results for author: Bisson, M