Skip to main content

Showing 1–34 of 34 results for author: Rajamanickam, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07898  [pdf, other

    physics.comp-ph cs.DC cs.ET

    Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System

    Authors: Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A Leon, James H Laros III, Michael James, Sivasankaran Rajamanickam

    Abstract: Molecular dynamics (MD) simulations have transformed our understanding of the nanoscale, driving breakthroughs in materials science, computational chemistry, and several other fields, including biophysics and drug design. Even on exascale supercomputers, however, runtimes are excessive for systems and timescales of scientific interest. Here, we demonstrate strong scaling of MD simulations on the C… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures, 5 tables

  2. arXiv:2304.13194  [pdf, other

    cs.DC cs.DM

    Jet: Multilevel Graph Partitioning on Graphics Processing Units

    Authors: Michael S. Gilbert, Kamesh Madduri, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: The multilevel heuristic is the dominant strategy for high-quality sequential and parallel graph partitioning. Partition refinement is a key step of multilevel graph partitioning. In this work, we present Jet, a new parallel algorithm for partition refinement specifically designed for Graphics Processing Units (GPUs). We combine Jet with GPU-aware coarsening to develop a $k$-way graph partitioner,… ▽ More

    Submitted 5 January, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: To appear in SIAM SISC journal

  3. arXiv:2304.04876  [pdf, other

    math.NA cs.DC cs.MS

    An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

    Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

    Abstract: The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlap** Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlap** Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the sol… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IPDPS'23

  4. arXiv:2303.11499  [pdf, other

    cs.DC cs.AR

    Exploiting Inter-Operation Data Reuse in Scientific Applications using GOGETA

    Authors: Raveesh Garg, Michael Pellauer, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: HPC applications are critical in various scientific domains ranging from molecular dynamics to chemistry to fluid dynamics. Conjugate Gradient (CG) is a popular application kernel used in iterative linear HPC solvers and has applications in numerous scientific domains. However, the HPCG benchmark shows that the peformance achieved by Top500 HPC systems on CG is a small fraction of the performance… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  5. arXiv:2209.04541  [pdf, other

    cs.DC cs.DS

    PGAbB: A Block-Based Graph Processing Framework for Heterogeneous Platforms

    Authors: Abdurrahman Yasar, Sivasankaran Rajamanickam, Jonathan W. Berry, Umit V. Catalyurek

    Abstract: Designing flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel graph processing framework, PGAbB (Parallel Graph Algorithms by Blocks), for modern shared-memory heterogeneous platforms. Our framework implements a block-based pr… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  6. arXiv:2204.02934  [pdf, other

    cs.DC

    Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and Graph Coarsening

    Authors: Brian Kelley, Sivasankaran Rajamanickam

    Abstract: Given a graph, finding the distance-2 maximal independent set (MIS-2) of the vertices is a problem that is useful in several contexts such as algebraic multigrid coarsening or multilevel graph partitioning. Such multilevel methods rely on finding the independent vertices so they can be used as seeds for aggregation in a multilevel scheme. We present a parallel MIS-2 algorithm to improve performanc… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in IPDPS 2022

  7. arXiv:2201.08916  [pdf, other

    cs.AR

    Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

    Authors: Eric Qin, Raveesh Garg, Abhimanyu Bambhaniya, Michael Pellauer, Angshuman Parashar, Sivasankaran Rajamanickam, Cong Hao, Tushar Krishna

    Abstract: Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they target tensor algebra (typically matrix multiplications); yet dozens of new accelerators are proposed for every new application. The motivation is that the size a… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  8. arXiv:2109.07419  [pdf, other

    cs.AR cs.DC cs.LG

    Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

    Authors: Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna

    Abstract: To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are s… ▽ More

    Submitted 6 November, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: This paper is accepted to PACT 2021

  9. arXiv:2109.01232  [pdf, other

    cs.DC cs.MS math.NA

    A Study of Mixed Precision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision stra… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.07544

  10. arXiv:2107.00075  [pdf, other

    cs.DC cs.DM

    Parallel Graph Coloring Algorithms for Distributed GPU Environments

    Authors: Ian Bogle, Erik G Boman, Karen D Devine, Sivasankaran Rajamanickam, George M Slota

    Abstract: Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but to the best of our knowledge, hybrid MPI+GPU algorithms have been unexplored until this work. We present several MPI+GP… ▽ More

    Submitted 30 June, 2021; originally announced July 2021.

    Comments: Submitted to Parallel Computing

  11. arXiv:2106.10499  [pdf, other

    cs.DC cs.AI cs.AR

    Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

    Authors: Gordon E. Moon, Hyoukjun Kwon, Geonhwa Jeong, Prasanth Chatarasi, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strat… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

  12. arXiv:2105.07544  [pdf, other

    math.NA cs.MS

    Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strat… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the IEEE IPDPS Accelerators and Hybrid Emerging Systems (AsHES) 11th Workshop, 2021

  13. arXiv:2105.00578  [pdf, other

    cs.DC cs.DM cs.MS

    Sphynx: a parallel multi-GPU graph partitioner for distributed-memory systems

    Authors: Seher Acer, Erik G Boman, Christian A Glusa, Sivasankaran Rajamanickam

    Abstract: Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no distributed-memory parallel, multi-G… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

    Comments: To appear in Parallel Computing

    Report number: SAND2021-0352-O MSC Class: 68W10

  14. arXiv:2104.01196  [pdf, other

    math.NA cs.MS

    Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

    Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

    Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More

    Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  15. arXiv:2103.11991  [pdf, other

    cs.MS

    Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

    Authors: Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, Ichitaro Yamazaki

    Abstract: As hardware architectures are evolving in the push towards exascale, develo** Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to develo** a portable library of kernels that serve the needs of several CSE applications and software frame… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Report number: SAND2021-3421 O

  16. arXiv:2103.10484  [pdf, other

    cs.LG cs.CV

    Concentric Spherical GNN for 3D Representation Learning

    Authors: James Fox, Bo Zhao, Sivasankaran Rajamanickam, Rampi Ramprasad, Le Song

    Abstract: Learning 3D representations that generalize well to arbitrarily oriented inputs is a challenge of practical importance in applications varying from computer vision to physics and chemistry. We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps, of which the single sphere representation is a special case. Our hierarchical architecture is… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: This paper has been submitted for conference review

  17. arXiv:2103.10452  [pdf

    cs.DC

    Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

    Authors: Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon E. Moon, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enab… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)

  18. arXiv:2103.07977  [pdf, other

    cs.DC cs.AR

    Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

    Authors: Raveesh Garg, Eric Qin, Francisco Muñoz-Martínez, Robert Guirado, Akshay Jain, Sergi Abadal, José L. Abellán, Manuel E. Acacio, Eduard Alarcón, Sivasankaran Rajamanickam, Tushar Krishna

    Abstract: Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelera… ▽ More

    Submitted 6 March, 2022; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: Accepted for publication at the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)

  19. arXiv:2012.12871  [pdf, other

    cs.CL cs.AI

    A Multimodal Framework for the Detection of Hateful Memes

    Authors: Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam, Georgios Antoniou, Ekaterina Shutova, Helen Yannakoudakis

    Abstract: An increasingly common expression of online hate speech is multimodal in nature and comes in the form of memes. Designing systems to automatically detect hateful content is of paramount importance if we are to mitigate its undesirable effects on the society at large. The detection of multimodal hate speech is an intrinsically difficult and open problem: memes convey a message using both images and… ▽ More

    Submitted 24 December, 2020; v1 submitted 23 December, 2020; originally announced December 2020.

    Journal ref: PMLR 133:344-360, 2021

  20. arXiv:2010.04905  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Accelerating Finite-temperature Kohn-Sham Density Functional Theory with Deep Neural Networks

    Authors: J. Austin Ellis, Lenz Fiedler, Gabriel A. Popoola, Normand A. Modine, J. Adam Stephens, Aidan P. Thompson, Attila Cangi, Sivasankaran Rajamanickam

    Abstract: We present a numerical modeling workflow based on machine learning (ML) which reproduces the the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spat… ▽ More

    Submitted 9 July, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Journal ref: Phys. Rev. B 104, 035120 (2021)

  21. arXiv:2009.12457  [pdf, other

    cs.DS

    A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

    Authors: Abdurrahman Yaşar, Sivasankaran Rajamanickam, Jonathan Berry, Ümit V. Çatalyürek

    Abstract: Triangle counting is a fundamental building block in graph algorithms. In this paper, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task deco… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  22. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4

  23. arXiv:2005.14028  [pdf, other

    cs.CL cs.LG

    Joint Modelling of Emotion and Abusive Language Detection

    Authors: Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

    Abstract: The rise of online communication platforms has been accompanied by some undesirable effects, such as the proliferation of aggressive and abusive behaviour online. Aiming to tackle this problem, the natural language processing (NLP) community has experimented with a range of techniques for abuse detection. While achieving substantial success, these methods have so far only focused on modelling the… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

  24. arXiv:1912.10206  [pdf, other

    cs.LG stat.ML

    How Robust Are Graph Neural Networks to Structural Noise?

    Authors: James Fox, Sivasankaran Rajamanickam

    Abstract: Graph neural networks (GNNs) are an emerging model for learning graph embeddings and making predictions on graph structured data. However, robustness of graph neural networks is not yet well-understood. In this work, we focus on node structural identity predictions, where a representative GNN model is able to achieve near-perfect accuracy. We also show that the same GNN model is not robust to addi… ▽ More

    Submitted 21 December, 2019; originally announced December 2019.

    Comments: Accepted workshop paper at Deep Learning on Graphs: Methodologies and Applications (DLGMA'20)

  25. arXiv:1808.08172  [pdf, other

    math.NA cs.DC

    Asynchronous One-Level and Two-Level Domain Decomposition Solvers

    Authors: Christian Glusa, Paritosh Ramanan, Erik G. Boman, Edmond Chow, Sivasankaran Rajamanickam

    Abstract: Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes on more heterogeneous systems make load balancing and network layout very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchr… ▽ More

    Submitted 10 August, 2020; v1 submitted 24 August, 2018; originally announced August 2018.

    MSC Class: 68W10; 65Y05; 68W15; 65N55

  26. arXiv:1804.09798  [pdf, other

    cs.DC

    Geometric Partitioning and Ordering Strategies for Task Map** on Parallel Computers

    Authors: Mehmet Deveci, Karen D. Devine, Kevin Pedretti, Mark A. Taylor, Sivasankaran Rajamanickam, Umit V. Catalyurek

    Abstract: We present a new method for map** applications' MPI tasks to cores of a parallel computer such that applications' communication time is reduced. We address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network, although our methods generalize to contiguous allocations as well… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Report number: SAND2018-4335R

  27. arXiv:1804.00695  [pdf, other

    cs.DC

    Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

    Authors: Mehmet Deveci, Simon D. Hammond, Michael M. Wolf, Sivasankaran Rajamanickam

    Abstract: Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading high-performance computing architectures… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Report number: SAND2018-3428 R

  28. arXiv:1801.03065  [pdf, other

    cs.DC

    Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

    Authors: Mehmet Deveci, Christian Trott, Sivasankaran Rajamanickam

    Abstract: Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architect… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

    Report number: SAND2018-0186 R

  29. arXiv:1712.07297  [pdf, other

    math.NA cs.MS

    A distributed-memory hierarchical solver for general sparse linear systems

    Authors: Chao Chen, Hadi Pouransari, Sivasankaran Rajamanickam, Erik G. Boman, Eric Darve

    Abstract: We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct s… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    MSC Class: 65F50

  30. arXiv:1701.00503  [pdf, other

    cs.DC

    Distributed Graph Layout for Scalable Small-world Network Analysis

    Authors: George M Slota, Sivasankaran Rajamanickam, Kamesh Madduri

    Abstract: The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and r… ▽ More

    Submitted 2 January, 2017; originally announced January 2017.

  31. arXiv:1610.07220  [pdf, other

    cs.DC

    Partitioning Trillion-edge Graphs in Minutes

    Authors: George M Slota, Sivasankaran Rajamanickam, Karen Devine, Kamesh Madduri

    Abstract: We introduce XtraPuLP, a new distributed-memory graph partitioner designed to process trillion-edge graphs. XtraPuLP is based on the scalable label propagation community detection technique, which has been demonstrated as a viable means to produce high quality partitions with minimal computation time. On a collection of large sparse graphs, we show that XtraPuLP partitioning quality is comparable… ▽ More

    Submitted 23 October, 2016; originally announced October 2016.

  32. arXiv:1601.05871  [pdf, other

    cs.MS

    Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

    Authors: Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter Edwards, Stephen L. Olivier

    Abstract: We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in t… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

    Comments: 25 pages

    Report number: SAND2016-0637 R MSC Class: 68W10

  33. arXiv:1601.05725  [pdf, other

    cs.DC

    Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

    Authors: Joshua Dennis Booth, Sivasankaran Rajamanickam, Heidi K. Thornquist

    Abstract: Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical in nature to m… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

  34. arXiv:1511.03703  [pdf, other

    cs.MS cs.CE

    Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

    Authors: E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, S. Rajamanickam

    Abstract: Quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large n… ▽ More

    Submitted 11 November, 2015; originally announced November 2015.

    Report number: SAND2015-9921 J