Skip to main content

Showing 1–4 of 4 results for author: Kottalam, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:1806.01270  [pdf, other

    cs.DC cs.DB physics.data-an stat.CO

    Alchemist: An Apache Spark <=> MPI Interface

    Authors: Alex Gittens, Kai Rothauge, Shusen Wang, Michael W. Mahoney, Jey Kottalam, Lisa Gerhardt, Prabhat, Michael Ringenburg, Kristyn Maschhoff

    Abstract: The Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce-style programming model can incur significant overheads when performing computations that do not map directly onto this model. One way to mitigate these costs is to off-load computations onto MPI codes. In recent work, we introduced Alchemist, a system for the… ▽ More

    Submitted 3 June, 2018; originally announced June 2018.

    Comments: Accepted for publication in Concurrency and Computation: Practice and Experience, Special Issue on the Cray User Group 2018. arXiv admin note: text overlap with arXiv:1805.11800

  2. arXiv:1805.11800  [pdf, other

    cs.DC cs.DB physics.data-an stat.CO

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Authors: Alex Gittens, Kai Rothauge, Shusen Wang, Michael W. Mahoney, Lisa Gerhardt, Prabhat, Jey Kottalam, Michael Ringenburg, Kristyn Maschhoff

    Abstract: Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 2018

  3. arXiv:1607.01335  [pdf, other

    cs.DC

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Authors: Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney, Prabhat

    Abstract: We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity… ▽ More

    Submitted 20 September, 2016; v1 submitted 5 July, 2016; originally announced July 2016.

    ACM Class: G.1.3; C.2.4

  4. arXiv:1310.5426  [pdf, other

    cs.LG cs.DC stat.ML

    MLI: An API for Distributed Machine Learning

    Authors: Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, Tim Kraska

    Abstract: MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implement… ▽ More

    Submitted 25 October, 2013; v1 submitted 21 October, 2013; originally announced October 2013.