Skip to main content

Showing 1–5 of 5 results for author: Deveci, M

.
  1. arXiv:2011.03641  [pdf, other

    cs.LG cs.DC

    Exploring the limits of Concurrency in ML Training on Google TPUs

    Authors: Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing

    Abstract: Recent results in language understanding using neural networks have required training hardware of unprecedentedscale, with thousands of chips cooperating on a single training run. This paper presents techniques to scaleML models on the Google TPU Multipod, a mesh with 4096 TPU-v3 chips. We discuss model parallelism toovercome scaling limitations from the fixed batch size in data parallelism, commu… ▽ More

    Submitted 15 March, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

  2. arXiv:1804.09798  [pdf, other

    cs.DC

    Geometric Partitioning and Ordering Strategies for Task Map** on Parallel Computers

    Authors: Mehmet Deveci, Karen D. Devine, Kevin Pedretti, Mark A. Taylor, Sivasankaran Rajamanickam, Umit V. Catalyurek

    Abstract: We present a new method for map** applications' MPI tasks to cores of a parallel computer such that applications' communication time is reduced. We address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network, although our methods generalize to contiguous allocations as well… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Report number: SAND2018-4335R

  3. arXiv:1804.00695  [pdf, other

    cs.DC

    Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

    Authors: Mehmet Deveci, Simon D. Hammond, Michael M. Wolf, Sivasankaran Rajamanickam

    Abstract: Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading high-performance computing architectures… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Report number: SAND2018-3428 R

  4. arXiv:1801.03065  [pdf, other

    cs.DC

    Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

    Authors: Mehmet Deveci, Christian Trott, Sivasankaran Rajamanickam

    Abstract: Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architect… ▽ More

    Submitted 9 January, 2018; originally announced January 2018.

    Report number: SAND2018-0186 R

  5. arXiv:1303.1379  [pdf, other

    cs.DC

    GPU accelerated maximum cardinality matching algorithms for bipartite graphs

    Authors: Mehmet Deveci, Kamer Kaya, Bora Ucar, Umit V. Catalyurek

    Abstract: We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other areas. To the best of our knowledge, ours is the first study which focuses on GPU implementation of the maximum cardinality matching algorithms. We compare the pr… ▽ More

    Submitted 6 March, 2013; originally announced March 2013.

    Comments: 14 pages, 5 figures