-
ZSMILES: an approach for efficient SMILES storage for random access in Virtual Screening
Authors:
Gianmarco Accordi,
Davide Gadioli,
Giorgio Seguini,
Andrea R. Beccari,
Gianluca Palermo
Abstract:
Virtual screening is a technique used in drug discovery to select the most promising molecules to test in a lab. To perform virtual screening, we need a large set of molecules as input, and storing these molecules can become an issue. In fact, extreme-scale high-throughput virtual screening applications require a big dataset of input molecules and produce an even bigger dataset as output. These mo…
▽ More
Virtual screening is a technique used in drug discovery to select the most promising molecules to test in a lab. To perform virtual screening, we need a large set of molecules as input, and storing these molecules can become an issue. In fact, extreme-scale high-throughput virtual screening applications require a big dataset of input molecules and produce an even bigger dataset as output. These molecules' databases occupy tens of TB of storage space, and domain experts frequently sample a small portion of this data. In this context, SMILES is a popular data format for storing large sets of molecules since it requires significantly less space to represent molecules than other formats (e.g., MOL2, SDF). This paper proposes an efficient dictionary-based approach to compress SMILES-based datasets. This approach takes advantage of domain knowledge to provide a readable output with separable SMILES, enabling random access. We examine the benefits of storing these datasets using ZSMILES to reduce the cold storage footprint in HPC systems. The main contributions concern a custom dictionary-based approach and a data pre-processing step. From experimental results, we can notice how ZSMILES leverage domain knowledge to compress x1.13 more than state of the art in similar scenarios and up to $0.29$ compression ratio. We tested a CUDA version of ZSMILES targetting NVIDIA's GPUs, showing a potential speedup of 7x.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Tunable and Portable Extreme-Scale Drug Discovery Platform at Exascale: the LIGATE Approach
Authors:
Gianluca Palermo,
Gianmarco Accordi,
Davide Gadioli,
Emanuele Vitali,
Cristina Silvano,
Bruno Guindani,
Danilo Ardagna,
Andrea R. Beccari,
Domenico Bonanni,
Carmine Talarico,
Filippo Lunghini,
Jan Martinovic,
Paulo Silva,
Ada Bohm,
Jakub Beranek,
Jan Krenek,
Branislav Jansik,
Luigi Crisci,
Biagio,
Cosenza,
Peter Thoman,
Philip Salzmann,
Thomas Fahringer,
Leila Alexander,
Gerardo Tauriello
, et al. (10 additional authors not shown)
Abstract:
Today digital revolution is having a dramatic impact on the pharmaceutical industry and the entire healthcare system. The implementation of machine learning, extreme-scale computer simulations, and big data analytics in the drug design and development process offers an excellent opportunity to lower the risk of investment and reduce the time to the patient.
Within the LIGATE project, we aim to i…
▽ More
Today digital revolution is having a dramatic impact on the pharmaceutical industry and the entire healthcare system. The implementation of machine learning, extreme-scale computer simulations, and big data analytics in the drug design and development process offers an excellent opportunity to lower the risk of investment and reduce the time to the patient.
Within the LIGATE project, we aim to integrate, extend, and co-design best-in-class European components to design Computer-Aided Drug Design (CADD) solutions exploiting today's high-end supercomputers and tomorrow's Exascale resources, fostering European competitiveness in the field.
The proposed LIGATE solution is a fully integrated workflow that enables to deliver the result of a virtual screening campaign for drug discovery with the highest speed along with the highest accuracy. The full automation of the solution and the possibility to run it on multiple supercomputing centers at once permit to run an extreme scale in silico drug discovery campaign in few days to respond promptly for example to a worldwide pandemic crisis.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Improving computation efficiency using input and architecture features for a virtual screening application
Authors:
Gianmarco Accordi,
Emanuele Vitali,
Davide Gadioli,
Luigi Crisci,
Biagio Cosenza,
Mauro Bisson,
Massimiliano Fatica,
Andrea Beccari,
Gianluca Palermo
Abstract:
Virtual screening is an early stage of the drug discovery process that selects the most promising candidates. In the urgent computing scenario it is critical to find a solution in a short time frame. In this paper, we focus on a real-world virtual screening application to evaluate out-of-kernel optimizations, that consider input and architecture features to improve the computation efficiency on GP…
▽ More
Virtual screening is an early stage of the drug discovery process that selects the most promising candidates. In the urgent computing scenario it is critical to find a solution in a short time frame. In this paper, we focus on a real-world virtual screening application to evaluate out-of-kernel optimizations, that consider input and architecture features to improve the computation efficiency on GPU. Experiment results on a modern supercomputer node show that we can almost double the performance. Moreover, we implemented the optimization using SYCL and it provides a consistent benefit with the CUDA optimization. A virtual screening campaign can use this gain in performance to increase the number of evaluated candidates, improving the probability of finding a drug.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.