-
Open Problems in (Hyper)Graph Decomposition
Authors:
Deepak Ajwani,
Rob H. Bisseling,
Katrin Casel,
Ümit V. Çatalyürek,
Cédric Chevalier,
Florian Chudigiewitsch,
Marcelo Fonseca Faraj,
Michael Fellows,
Lars Gottesbüren,
Tobias Heuer,
George Karypis,
Kamer Kaya,
Jakub Lacki,
Johannes Langguth,
Xiaoye Sherry Li,
Ruben Mayer,
Johannes Meintrup,
Yosuke Mizutani,
François Pellegrini,
Fabrizio Petrini,
Frances Rosamond,
Ilya Safro,
Sebastian Schlag,
Christian Schulz,
Roohani Sharma
, et al. (4 additional authors not shown)
Abstract:
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for comple…
▽ More
Large networks are useful in a wide range of applications. Sometimes problem instances are composed of billions of entities. Decomposing and analyzing these structures helps us gain new insights about our surroundings. Even if the final application concerns a different problem (such as traversal, finding paths, trees, and flows), decomposing large graphs is often an important subproblem for complexity reduction or parallelization. This report is a summary of discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph Decomposition" and presents currently open problems and future directions in the area of (hyper)graph decomposition.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Minimizing communication in the multidimensional FFT
Authors:
Thomas Koopman,
Rob H. Bisseling
Abstract:
We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most $\sqrt{N}$ processors for an FFT o…
▽ More
We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most $\sqrt{N}$ processors for an FFT on an array with a total of $N$ elements, irrespective of the dimension $d$ or the shape of the array. The only assumption we make is that $N$ is sufficiently composite. Our algorithm starts and ends in the same data distribution.
We present our multidimensional implementation FFTU which utilizes the sequential FFTW program for its local FFTs, and which can handle any dimension $d$. We obtain experimental results for $d\leq 5$ using MPI on up to 4096 cores of the supercomputer Snellius, comparing FFTU with the parallel FFTW program and with PFFT and heFFTe. These results show that FFTU is competitive with the state of the art and that it allows one to use a larger number of processors, while kee** communication limited to a single all-to-all operation. For arrays of size $1024^3$ and $64^5$, FFTU achieves a speedup of a factor 149 and 176, respectively, on 4096 processors.
△ Less
Submitted 11 December, 2023; v1 submitted 22 March, 2022;
originally announced March 2022.
-
An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning
Authors:
Timon E. Knigge,
Rob H. Bisseling
Abstract:
We investigate sparse matrix bipartitioning -- a problem where we minimize the communication volume in parallel sparse matrix-vector multiplication. We prove, by reduction from graph bisection, that this problem is $\mathcal{NP}$-complete in the case where each side of the bipartitioning must contain a linear fraction of the nonzeros.
We present an improved exact branch-and-bound algorithm which…
▽ More
We investigate sparse matrix bipartitioning -- a problem where we minimize the communication volume in parallel sparse matrix-vector multiplication. We prove, by reduction from graph bisection, that this problem is $\mathcal{NP}$-complete in the case where each side of the bipartitioning must contain a linear fraction of the nonzeros.
We present an improved exact branch-and-bound algorithm which finds the minimum communication volume for a given matrix and maximum allowed imbalance. The algorithm is based on a maximum-flow bound and a packing bound, which extend previous matching and packing bounds.
We implemented the algorithm in a new program called MP (Matrix Partitioner), which solved 839 matrices from the SuiteSparse collection to optimality, each within 24 hours of CPU-time. Furthermore, MP solved the difficult problem of the matrix cage6 in about 3 days. The new program is on average more than ten times faster than the previous program MondriaanOpt.
Benchmark results using the set of 839 optimally solved matrices show that combining the medium-grain/iterative refinement methods of the Mondriaan package with the hypergraph bipartitioner of the PaToH package produces sparse matrix bipartitionings on average within 10% of the optimal solution.
△ Less
Submitted 7 May, 2020; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Exact enumeration of self-avoiding walks on BCC and FCC lattices
Authors:
Raoul D. Schram,
Gerard T. Barkema,
Rob H. Bisseling,
Nathan Clisby
Abstract:
Self-avoiding walks on the body-centered-cubic (BCC) and face-centered-cubic (FCC) lattices are enumerated up to lengths 28 and 24, respectively, using the length-doubling method. Analysis of the enumeration results yields values for the exponents $γ$ and $ν$ which are in agreement with, but less accurate than those obtained earlier from enumeration results on the simple cubic lattice. The non-uni…
▽ More
Self-avoiding walks on the body-centered-cubic (BCC) and face-centered-cubic (FCC) lattices are enumerated up to lengths 28 and 24, respectively, using the length-doubling method. Analysis of the enumeration results yields values for the exponents $γ$ and $ν$ which are in agreement with, but less accurate than those obtained earlier from enumeration results on the simple cubic lattice. The non-universal growth constant and amplitudes are accurately determined, yielding for the BCC lattice $μ=6.530520(20)$, $A=1.1785(40)$, and $D=1.0864(50)$, and for the FCC lattice $μ=10.037075(20)$, $A=1.1736(24)$, and $D=1.0460(50)$.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
SAWdoubler: a program for counting self-avoiding walks
Authors:
Raoul D. Schram,
Gerard T. Barkema,
Rob H. Bisseling
Abstract:
This article presents SAWdoubler, a package for counting the total number Z(N) of self-avoiding walks (SAWs) on a regular lattice by the length-doubling method, of which the basic concept has been published previously by us. We discuss an algorithm for the creation of all SAWs of length N, efficient storage of these SAWs in a tree data structure, and an algorithm for the computation of correction…
▽ More
This article presents SAWdoubler, a package for counting the total number Z(N) of self-avoiding walks (SAWs) on a regular lattice by the length-doubling method, of which the basic concept has been published previously by us. We discuss an algorithm for the creation of all SAWs of length N, efficient storage of these SAWs in a tree data structure, and an algorithm for the computation of correction terms to the count Z(2N) for SAWs of double length, removing all combinations of two intersecting single-length SAWs.
We present an efficient numbering of the lattice sites that enables exploitation of symmetry and leads to a smaller tree data structure; this numbering is by increasing Euclidean distance from the origin of the lattice. Furthermore, we show how the computation can be parallelised by distributing the iterations of the main loop of the algorithm over the cores of a multicore architecture. Experimental results on the 3D cubic lattice demonstrate that Z(28) can be computed on a dual-core PC in only 1 hour and 40 minutes, with a speedup of 1.56 compared to the single-core computation and with a gain by using symmetry of a factor of 26. We present results for memory use and show how the computation is made to fit in 4 Gbyte RAM. It is easy to extend the SAWdoubler software to other lattices; it is publicly available under the GNU LGPL license.
△ Less
Submitted 23 August, 2012;
originally announced August 2012.
-
A Geometric Approach to Matrix Ordering
Authors:
B. O. Fagginger Auer,
R. H. Bisseling
Abstract:
We present a recursive way to partition hypergraphs which creates and exploits hypergraph geometry and is suitable for many-core parallel architectures. Such partitionings are then used to bring sparse matrices in a recursive Bordered Block Diagonal form (for processor-oblivious parallel LU decomposition) or recursive Separated Block Diagonal form (for cache-oblivious sparse matrix-vector multipli…
▽ More
We present a recursive way to partition hypergraphs which creates and exploits hypergraph geometry and is suitable for many-core parallel architectures. Such partitionings are then used to bring sparse matrices in a recursive Bordered Block Diagonal form (for processor-oblivious parallel LU decomposition) or recursive Separated Block Diagonal form (for cache-oblivious sparse matrix-vector multiplication). We show that the quality of the obtained partitionings and orderings is competitive by comparing obtained fill-in for LU decomposition with SuperLU (with better results for 8 of the 28 test matrices) and comparing cut sizes for sparse matrix-vector multiplication with Mondriaan (with better results for 4 of the 12 test matrices). The main advantage of the new method is its speed: it is on average 21.6 times faster than Mondriaan.
△ Less
Submitted 23 May, 2011;
originally announced May 2011.
-
Exact enumeration of self-avoiding walks
Authors:
Raoul D. Schram,
Gerard T. Barkema,
Rob H. Bisseling
Abstract:
A prototypical problem on which techniques for exact enumeration are tested and compared is the enumeration of self-avoiding walks. Here, we show an advance in the methodology of enumeration, making the process thousands or millions of times faster. This allowed us to enumerate self-avoiding walks on the simple cubic lattice up to a length of 36 steps.
A prototypical problem on which techniques for exact enumeration are tested and compared is the enumeration of self-avoiding walks. Here, we show an advance in the methodology of enumeration, making the process thousands or millions of times faster. This allowed us to enumerate self-avoiding walks on the simple cubic lattice up to a length of 36 steps.
△ Less
Submitted 12 April, 2011;
originally announced April 2011.
-
Towards device-size atomistic models of amorphous silicon
Authors:
R. L. C. Vink,
G. T. Barkema,
M. A. Stijnman,
R. H. Bisseling
Abstract:
The atomic structure of amorphous materials is believed to be well described by the continuous random network model. We present an algorithm for the generation of large, high-quality continuous random networks. The algorithm is a variation of the "sillium" approach introduced by Wooten, Winer, and Weaire. By employing local relaxation techniques, local atomic rearrangements can be tried that sca…
▽ More
The atomic structure of amorphous materials is believed to be well described by the continuous random network model. We present an algorithm for the generation of large, high-quality continuous random networks. The algorithm is a variation of the "sillium" approach introduced by Wooten, Winer, and Weaire. By employing local relaxation techniques, local atomic rearrangements can be tried that scale almost independently of system size. This scaling property of the algorithm paves the way for the generation of realistic device-size atomic networks.
△ Less
Submitted 17 July, 2001;
originally announced July 2001.
-
Partitioning 3D space for parallel many-particle simulations
Authors:
M. A. Stijnman,
R. H. Bisseling,
G. T. Barkema
Abstract:
In a common approach for parallel processing applied to simulations of many-particle systems with short-ranged interactions and uniform density, the simulation cell is partitioned into domains of equal shape and size, each of which is assigned to one processor. We compare the commonly used simple-cubic (SC) domain shape to domain shapes ch osen as the Voronoi cells of BCC and FCC lattices. The l…
▽ More
In a common approach for parallel processing applied to simulations of many-particle systems with short-ranged interactions and uniform density, the simulation cell is partitioned into domains of equal shape and size, each of which is assigned to one processor. We compare the commonly used simple-cubic (SC) domain shape to domain shapes ch osen as the Voronoi cells of BCC and FCC lattices. The latter two are found to result in superior partitionings with respect to c ommunication overhead. Other domain shapes, relevant for a small number of processors, are also discussed. The higher efficiency with BCC and FCC partitionings is demonstrated in simulations of the sillium model for amorphous silicon.
△ Less
Submitted 14 May, 2001;
originally announced May 2001.
-
DNA electrophoresis studied with the cage model
Authors:
A. van Heukelum,
G. T. Barkema,
R. H. Bisseling
Abstract:
The cage model for polymer reptation, proposed by Evans and Edwards, and its recent extension to model DNA electrophoresis, are studied by numerically exact computation of the drift velocities for polymers with a length L of up to 15 monomers. The computations show the Nernst-Einstein regime (v ~ E) followed by a regime where the velocity decreases exponentially with the applied electric field s…
▽ More
The cage model for polymer reptation, proposed by Evans and Edwards, and its recent extension to model DNA electrophoresis, are studied by numerically exact computation of the drift velocities for polymers with a length L of up to 15 monomers. The computations show the Nernst-Einstein regime (v ~ E) followed by a regime where the velocity decreases exponentially with the applied electric field strength. In agreement with de Gennes' reptation arguments, we find that asymptotically for large polymers the diffusion coefficient D decreases quadratically with polymer length; for the cage model, the proportionality coefficient is DL^2=0.175(2). Additionally we find that the leading correction term for finite polymer lengths scales as N^{-1/2}, where N=L-1 is the number of bonds.
△ Less
Submitted 3 June, 2002; v1 submitted 31 January, 2001;
originally announced January 2001.