Search | arXiv e-print repository

Edge-Disjoint Spanning Trees on Star-Product Networks

Authors: Aleyah Dawkins, Kelly Isham, Ales Kubicek, Kartik Lakhotia, Laura Monroe

Abstract: Star-product graphs are a natural extension of the Cartesian product, but have not been well-studied. We show that many important established and emerging network topologies, including HyperX, SlimFly, BundleFly, PolarStar, mesh, and torus, are in fact star-product graphs. While this connection was known for BundleFly and PolarStar, it was not for the others listed. We extend a method of constru… ▽ More Star-product graphs are a natural extension of the Cartesian product, but have not been well-studied. We show that many important established and emerging network topologies, including HyperX, SlimFly, BundleFly, PolarStar, mesh, and torus, are in fact star-product graphs. While this connection was known for BundleFly and PolarStar, it was not for the others listed. We extend a method of constructing maximal and near-maximal sets of edge-disjoint spanning trees on Cartesian products to the star product, thus obtain maximal or near-maximal sets of edge-disjoint spanning trees on new networks of importance, where such sets can improve bandwidth of collective operations and therefore accelerate many important workloads in high-performance computing. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2302.07217 [pdf, other]

PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks

Authors: Kartik Lakhotia, Laura Monroe, Kelly Isham, Maciej Besta, Nils Blach, Torsten Hoefler, Fabrizio Petrini

Abstract: In this paper, we present PolarStar, a novel family of diameter-3 network topologies derived from the star product of two low-diameter factor graphs. The proposed PolarStar construction gives the largest known diameter-3 network topologies for almost all radixes. When compared to state-of-the-art diameter-3 networks, PolarStar achieves 31% geometric mean increase in scale over Bundlefly, 91% over… ▽ More In this paper, we present PolarStar, a novel family of diameter-3 network topologies derived from the star product of two low-diameter factor graphs. The proposed PolarStar construction gives the largest known diameter-3 network topologies for almost all radixes. When compared to state-of-the-art diameter-3 networks, PolarStar achieves 31% geometric mean increase in scale over Bundlefly, 91% over Dragonfly, and 690% over 3-D HyperX. PolarStar has many other desirable properties including a modular layout, large bisection, high resilience to link failures and a large number of feasible sizes for every radix. Our evaluation shows that it exhibits comparable or better performance than other diameter-3 networks under various traffic patterns. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 13 pages, 13 figures, 4 tables

ACM Class: B.4.3; B.4.4; G.2.2

arXiv:2208.01695 [pdf, other]

doi 10.1109/SC41404.2022.00017

PolarFly: A Cost-Effective and Flexible Low-Diameter Topology

Authors: Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, Fabrizio Petrini

Abstract: In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi family of polarity graphs from finite geometry. This is a highly scalable low-diameter topology that asymptotically reaches the Moore bound on the number of nodes for a given network degree and diameter PolarFly achieves high Moore bound efficiency even for the moderate radixes commonly seen in current and… ▽ More In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi family of polarity graphs from finite geometry. This is a highly scalable low-diameter topology that asymptotically reaches the Moore bound on the number of nodes for a given network degree and diameter PolarFly achieves high Moore bound efficiency even for the moderate radixes commonly seen in current and near-future routers, reaching more than 96% of the theoretical peak. It also offers more feasible router degrees than the state-of-the-art solutions, greatly adding to the selection of scalable diameter-2 networks. PolarFly enjoys many other topological properties highly relevant in practice, such as a modular design and expandability that allow incremental growth in network size without rewiring the whole network. Our evaluation shows that PolarFly outperforms competitive networks in terms of scalability, cost and performance for various traffic patterns. △ Less

Submitted 2 May, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: In Proceedings of International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2022

ACM Class: B.4.3; B.4.4

arXiv:2112.03229 [pdf, other]

doi 10.1115/1.4052728

SIMD-Optimized Search Over Sorted Data

Authors: Benjamin Mastripolito, Nicholas Koskelo, Dylan Weatherred, David A. Pimentel, Daniel Sheppard, Anna Pietarila Graham, Laura Monroe, Robert Robey

Abstract: Applications often require a fast, single-threaded search algorithm over sorted data, typical in table-lookup operations. We explore various search algorithms for a large number of search candidates over a relatively small array of logarithmically-distributed sorted data. These include an innovative hash-based search that takes advantage of floating point representation to bin data by the exponent… ▽ More Applications often require a fast, single-threaded search algorithm over sorted data, typical in table-lookup operations. We explore various search algorithms for a large number of search candidates over a relatively small array of logarithmically-distributed sorted data. These include an innovative hash-based search that takes advantage of floating point representation to bin data by the exponent. Algorithms that can be optimized to take advantage of SIMD vector instructions are of particular interest. We then conduct a case study applying our results and analyzing algorithmic performance with the EOSPAC package. EOSPAC is a table look-up library for manipulation and interpolation of SESAME equation-of-state data. Our investigation results in a couple of algorithms with better performance with a best case 8x speedup over the original EOSPAC Hunt-and-Locate implementation. Our techniques are generalizable to other instances of search algorithms seeking to get a performance boost from vectorization. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Report number: LA-UR-20-30218

Journal ref: J. Comput. Inf. Sci. Eng. Apr 2022, 22(2)

arXiv:2111.05996 [pdf, other]

A Few Identities of the Takagi Function on Dyadic Rationals

Authors: Laura Monroe

Abstract: The number of unbalanced interior nodes of divide-and-conquer trees on $n$ leaves is known to form a sequence of dilations of the Takagi function on dyadic rationals. We use this fact to derive identities on the Takagi function, and on the Hamming weight of an integer in terms of the Takagi function. The number of unbalanced interior nodes of divide-and-conquer trees on $n$ leaves is known to form a sequence of dilations of the Takagi function on dyadic rationals. We use this fact to derive identities on the Takagi function, and on the Hamming weight of an integer in terms of the Takagi function. △ Less

Submitted 15 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 11 pages, 1 figure. This paper is a small portion of arXiv:2108.11496 with some additions, split for publication. Version 2 adds references to another proof of the weight theorem and makes some notation changes

Report number: LA-UR-21-31244 MSC Class: 68R05 (Primary) 26A27 (Secondary); 05C05; 28A80

arXiv:2108.11496 [pdf, other]

A Class of Trees Having Near-Best Balance

Authors: Laura Monroe

Abstract: Full binary trees naturally represent commutative non-associative products. There are many important examples of these products: finite-precision floating-point addition and NAND gates, among others. Balance in such a tree is highly desirable for efficiency in calculation. The best balance is attained with a divide-and-conquer approach. However, this may not be the optimal solution, since the succ… ▽ More Full binary trees naturally represent commutative non-associative products. There are many important examples of these products: finite-precision floating-point addition and NAND gates, among others. Balance in such a tree is highly desirable for efficiency in calculation. The best balance is attained with a divide-and-conquer approach. However, this may not be the optimal solution, since the success of many calculations is dependent on the grou** and ordering of the calculation, for reasons ranging from the avoidance of rounding error, to calculating with varying precision, to the placement of calculation within a heterogeneous system. We introduce a new class of computational trees having near-best balance in terms of the Colless index from mathematical phylogenetics. These trees are easily constructed from the binary decomposition of the number of terms in the problem. They also permit much more flexibility than the optimally balanced divide-and-conquer trees. This gives needed freedom in the grou** and ordering of calculation, and allows intelligent efficiency trade-offs. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: 60 pages, 10 figures

MSC Class: 05A10; 05C05 (Primary) 65G50; 92D10 (Secondary)

arXiv:2103.05810

Binary Signed-Digit Integers, the Stern Diatomic Sequence and Stern Polynomials

Authors: Laura Monroe

Abstract: Stern's diatomic sequence is a well-studied and simply defined sequence with many fascinating characteristics. The binary signed-digit (BSD) representation of integers is used widely in efficient computation, coding theory and other applications. We link these two objects, showing that the number of $i$-bit binary signed-digit representations of an integer $n<2^i$ is the $(2^i-n)^\text{th}$ elemen… ▽ More Stern's diatomic sequence is a well-studied and simply defined sequence with many fascinating characteristics. The binary signed-digit (BSD) representation of integers is used widely in efficient computation, coding theory and other applications. We link these two objects, showing that the number of $i$-bit binary signed-digit representations of an integer $n<2^i$ is the $(2^i-n)^\text{th}$ element in Stern's diatomic sequence. This correspondence makes the vast range of results known about the Stern diatomic sequence available for consideration in the study of binary signed-digit integers, and vice versa. Applications of this relationship discussed in this paper include a weight-distribution theorem for BSD representations, linking these representations to Stern polynomials, a recursion for the number of optimal BSD representations of an integer along with their Hamming weight, stemming from an easy recursion for the leading coefficients and degrees of Stern polynomials, and the identification of all integers having a maximal number of such representations. △ Less

Submitted 30 August, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: This paper has been subsumed by two other papers, which can be found on arXiv: "Binary Signed-Digit Integers and the Stern Diatomic Sequence" at arxiv:2108.11495, and "Binary Signed-Digit Integers and the Stern Polynomial" at arxiv:2108.12417

MSC Class: 11A63 (Primary); 11B83 (Secondary); 68R01

arXiv:2005.05387 [pdf, ps, other]

Computationally Inequivalent Summations and Their Parenthetic Forms

Authors: Laura Monroe, Vanessa Job

Abstract: Floating-point addition on a finite-precision machine is not associative, so not all mathematically equivalent summations are computationally equivalent. Making this assumption can lead to numerical error in computations. Proper ordering and parenthesizing is a low-overhead way of mitigating such error in a floating point summation. Ordered and parenthesized summations fall into equivalence clas… ▽ More Floating-point addition on a finite-precision machine is not associative, so not all mathematically equivalent summations are computationally equivalent. Making this assumption can lead to numerical error in computations. Proper ordering and parenthesizing is a low-overhead way of mitigating such error in a floating point summation. Ordered and parenthesized summations fall into equivalence classes. We describe these classes, and the parenthetic forms summations in these classes take. We provide summation-related interpretations for sequences known in other contexts, and give new recursive and closed formulas for sequences not previously related to summation. We also introduce a data structure that facilitates understanding of these objects, and use it to consider certain forms of summation used by default in widely used computer languages. Finally, we relate this data structure to other mathematical constructs from the fields of mathematical analysis and algorithmic analysis. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: 24 pages, 6 figures

ACM Class: G.2.1; F.2.2

Showing 1–8 of 8 results for author: Monroe, L