-
Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher
Authors:
Mohsen Koohi Esfahani,
Marco D'Antonio,
Syed Ibtisam Tauhidi,
Thai Son Mai,
Hans Vandierendonck
Abstract:
Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libr…
▽ More
Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks.
To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats.
ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.
△ Less
Submitted 17 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
MS-BioGraphs: Sequence Similarity Graph Datasets
Authors:
Mohsen Koohi Esfahani,
Paolo Boldi,
Hans Vandierendonck,
Peter Kilpatrick,
Sebastiano Vigna
Abstract:
Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.
To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creati…
▽ More
Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.
To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creating MS-BioGraphs, a new family of publicly available real-world edge-weighted graph datasets with up to $2.5$ trillion edges, that is, $6.6$ times greater than the largest graph published recently. The largest graph is created by matching (i.e., all-to-all similarity aligning) $1.7$ billion protein sequences. The MS-BioGraphs family includes also seven subgraphs with different sizes and direction types.
We describe two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We present a comparative study of structural characteristics of MS-BioGraphs.
The datasets are available online on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs .
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Moving Least Squares Approximation using Variably Scaled Discontinuous Weight Function
Authors:
Mohammad Karimnejad Esfahani,
Stefano De Marchi,
Francesco Marchetti
Abstract:
Functions with discontinuities appear in many applications such as image reconstruction, signal processing, optimal control problems, interface problems, engineering applications and so on. Accurate approximation and interpolation of these functions are therefore of great importance. In this paper, we design a moving least-squares approach for scattered data approximation that incorporates the dis…
▽ More
Functions with discontinuities appear in many applications such as image reconstruction, signal processing, optimal control problems, interface problems, engineering applications and so on. Accurate approximation and interpolation of these functions are therefore of great importance. In this paper, we design a moving least-squares approach for scattered data approximation that incorporates the discontinuities in the weight functions. The idea is to control the influence of the data sites on the approximant, not only with regards to their distance from the evaluation point, but also with respect to the discontinuity of the underlying function. We also provide an error estimate on a suitable {\it piecewise} Sobolev Space. The numerical experiments are in compliance with the convergence rate derived theoretically.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.