Skip to main content

Showing 1–8 of 8 results for author: Elster, A C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08976  [pdf, other

    cs.DC

    Towards a Benchmarking Suite for Kernel Tuners

    Authors: Jacob O. Tørring, Ben van Werkhoven, Filip Petrovic, Floris-Jan Willemsen, Jirí Filipovic, Anne C. Elster

    Abstract: As computing system become more complex, it is becoming harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as possible from the user, so that the code can be used efficiently across different generations of systems. In this article we introduce a new benchmark suite for eval… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  2. arXiv:2203.13577  [pdf, other

    cs.DC

    Analyzing Search Techniques for Autotuning Image-based GPU Kernels: The Impact of Sample Sizes

    Authors: Jacob O. Tørring, Anne C. Elster

    Abstract: Modern computing systems are increasingly more complex, with their multicore CPUs and GPUs accelerators changing yearly, if not more often. It thus has become very challenging to write programs that efficiently use the associated complex memory systems and take advantage of the available parallelism. Autotuning addresses this by optimizing parameterized code to the targeted hardware by searching f… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: 10 pages, 5 figures

  3. arXiv:2103.14409  [pdf, other

    cs.DC cs.AI

    LS-CAT: A Large-Scale CUDA AutoTuning Dataset

    Authors: Lars Bjertnes, Jacob O. Tørring, Anne C. Elster

    Abstract: The effectiveness of Machine Learning (ML) methods depend on access to large suitable datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 CUDA kernels focused on linear algebra. In addition to the CUDA codes, our LS-CAT dataset contains 5 028 536 associat… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  4. arXiv:2103.08716  [pdf, other

    cs.DC

    Autotuning Benchmarking Techniques: A Roofline Model Case Study

    Authors: Jacob Odgård Tørring, Jan Christian Meyer, Anne C. Elster

    Abstract: Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks. In this paper we show how… ▽ More

    Submitted 18 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: 10 pages, 6 figures

  5. arXiv:1605.06399  [pdf, other

    cs.DC cs.PL

    ImageCL: An Image Processing Language for Performance Portability on Heterogeneous Systems

    Authors: Thomas L. Falch, Anne C. Elster

    Abstract: Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be retuned to achieve high performance on another. Image processing is increas- ing in importance , with applications ranging from seismology and medicine to Photoshop.… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

  6. Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

    Authors: Thomas L. Falch, Anne C. Elster

    Abstract: Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve goo… ▽ More

    Submitted 2 June, 2015; originally announced June 2015.

    Comments: This is a pre-print version an article to be published in the Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). For personal use only

  7. Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

    Authors: Daniel S. Katz, Sou-Cheng T. Choi, Hilmar Lapp, Ketan Maheshwari, Frank Löffler, Matthew Turk, Marcus D. Hanwell, Nancy Wilkins-Diehr, James Hetherington, James Howison, Shel Swenson, Gabrielle D. Allen, Anne C. Elster, Bruce Berriman, Colin Venters

    Abstract: Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)… ▽ More

    Submitted 12 June, 2014; v1 submitted 29 April, 2014; originally announced April 2014.

    Comments: Journal of Open Research Software, 2014

  8. arXiv:1309.2357  [pdf, ps, other

    cs.CY cs.DC

    Software for Science: Some Personal Reflections

    Authors: Anne C. Elster

    Abstract: As computer systems become more and more complex, software and tools lag more and more behind. This is especially true for scientific software that often demands high performance, and thus needs to take advantage of parallelisms, memory hierarchies and other software and systems. How do we help bridge this ever-increasing gap? This paper describes some of my experiences and thoughts regarding li… ▽ More

    Submitted 9 September, 2013; originally announced September 2013.

    Comments: 4-page draft for SC13 workshop WSSSPE