Skip to main content

Showing 1–19 of 19 results for author: Pericas, M

.
  1. arXiv:2311.05284  [pdf, other

    cs.DC

    Challenges and Opportunities in the Co-design of Convolutions and RISC-V Vector Processors

    Authors: Sonia Rani Gupta, Nikela Papadopoulou, Miquel Pericàs

    Abstract: The RISC-V "V" extension introduces vector processing to the RISC-V architecture. Unlike most SIMD extensions, it supports long vectors which can result in significant improvement of multiple applications. In this paper, we present our ongoing research to implement and optimize a vectorized Winograd algorithm used in convolutional layers on RISC-V Vector(RISC-VV) processors. Our study identifies e… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: To appear at the Second International workshop on RISC-V for HPC, co-located with SC 2023

  2. arXiv:2311.05267  [pdf, other

    cs.DC

    Analysis and Characterization of Performance Variability for OpenMP Runtime

    Authors: Minyu Cui, Nikela Papadopoulou, Miquel Pericàs

    Abstract: In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance analysis of OpenMP benchmarks regarding the variation of execution time, and determine the potential factors causing performance variability. Our work offers some… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: To appear at ROSS 2023 (International Workshop on Runtime and Operating Systems for Supercomputers), held in conjunction with SC23

  3. arXiv:2306.04615  [pdf, ps, other

    cs.DC

    JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

    Authors: **g Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs

    Abstract: Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallel programming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry and CPU DVFS. Additionally, these proposals mostly use heuristics and lack the ability to explore the trade-offs between energy usage and performance.… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  4. arXiv:2306.01679  [pdf, other

    cs.DC

    ODIN: Overcoming Dynamic Interference in iNference pipelines

    Authors: Pirah Noor Soomro, Nikela Papadopoulou, Miquel Pericàs

    Abstract: As an increasing number of businesses becomes powered by machine-learning, inference becomes a core operation, with a growing trend to be offered as a service. In this context, the inference task must meet certain service-level objectives (SLOs), such as high throughput and low latency. However, these targets can be compromised by interference caused by long- or short-lived co-located tasks. Prior… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: To appear at Euro-Par 2023

  5. arXiv:2212.11574  [pdf, other

    cs.DC

    Accelerating CNN inference on long vector architectures via co-design

    Authors: Sonia Rani Gupta, Nikela Papadopoulou, Miquel Pericas

    Abstract: CPU-based inference can be an alternative to off-chip accelerators, and vector architectures are a promising option due to their efficiency. However, the large design space of convolutional algorithms and hardware implementations makes it challenging to select the best options. This paper presents ongoing research into co-designing vector architectures for CPU-based CNN inference, focusing on the… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: To appear at IPDPS 2023

  6. Designing an exploratory phase 2b platform trial in NASH with correlated, co-primary binary endpoints

    Authors: Elias Laurin Meyer, Peter Mesenbrink, Nicholas A. Di Prospero, Juan M. Pericàs, Ekkehard Glimm, Vlad Ratziu, Elena Sena, Franz König

    Abstract: Non-alcoholic steatohepatitis (NASH) is the progressive form of nonalcoholic fatty liver disease (NAFLD) and a disease with high unmet medical need. Platform trials provide great benefits for sponsors and trial participants in terms of accelerating drug development programs. In this article, we describe some of the activities of the EU-PEARL consortium (EU Patient-cEntric clinicAl tRial pLatforms)… ▽ More

    Submitted 11 January, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

  7. arXiv:2209.04317  [pdf, other

    cs.DC

    Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs

    Authors: Henrik Valter, Axel Karlsson, Miquel Pericàs

    Abstract: OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here analyse aspects of OpenMP from an energy consumption perspective. This analysis is accomplished by executing novel microbenchmarks and common benchmark suites… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  8. arXiv:2204.02235  [pdf, other

    cs.DC

    At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

    Authors: Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka

    Abstract: Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method… ▽ More

    Submitted 16 October, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

  9. arXiv:2202.11575  [pdf, other

    cs.PF cs.LG

    Shisha: Online scheduling of CNN pipelines on heterogeneous architectures

    Authors: Pirah Noor Soomro, Mustafa Abduljabbar, Jeronimo Castrillon, Miquel Pericàs

    Abstract: Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized i… ▽ More

    Submitted 4 December, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

  10. ERASE: Energy Efficient Task Map** and Resource Management for Work Stealing Runtimes

    Authors: **g Chen, Madhavan Manivannan, Mustafa Abduljabbar, Miquel Pericàs

    Abstract: Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challenging, particularly when using asymmetric architectures with different types of CPU cores. A common approach for energy savings involves dynamic volta… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

  11. arXiv:2112.09509  [pdf, other

    cs.DC cs.PF

    Mitigating inefficient task map**s with an Adaptive Resource-Moldable Scheduler (ARMS)

    Authors: Mustafa Abduljabbar, Mahmoud Eljammaly, Miquel Pericas

    Abstract: Efficient runtime task scheduling on complex memory hierarchy becomes increasingly important as modern and future High-Performance Computing (HPC) systems are progressively composed of multisocket and multi-chiplet nodes with nonuniform memory access latencies. Existing locality-aware scheduling schemes either require control of the data placement policy for memory-bound tasks or maximize locality… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  12. arXiv:2102.11528  [pdf, other

    cs.AR

    CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

    Authors: Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs

    Abstract: Reducing the average memory access time is crucial for improving the performance of applications running on multi-core architectures. With workload consolidation this becomes increasingly challenging due to shared resource contention. Techniques for partitioning of shared resources - cache and bandwidth - and prefetching throttling have been proposed to mitigate contention and reduce the average m… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

  13. arXiv:2009.00915  [pdf, other

    cs.DC

    Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

    Authors: **g Chen, Pirah Noor Soomro, Mustafa Abduljabbar, Madhavan Manivannan, Miquel Pericas

    Abstract: Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and… ▽ More

    Submitted 22 September, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: Published in ICPP Workshops '20

  14. arXiv:2005.07619   

    cs.DC

    Proceedings of the Thirteenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2020)

    Authors: Miquel Pericas, Oscar Palomar, Vassilis Papaefstathiou, Mahmoud Eljammaly

    Abstract: This volume contains the proceedings of the 13th International Workshop on Programmability and Architectures for Heterogeneous Multicores. The workshop was held in conjunction with the 16th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC) in Bologna, Italy on January 20th, 2020.

    Submitted 10 January, 2020; originally announced May 2020.

    Comments: This volume contains 3 full papers and 3 position papers

  15. arXiv:1912.01563  [pdf, other

    cs.DC

    LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing

    Authors: B. Salami, K. Parasyris, A. Cristal, O. Unsal, X. Martorell, P. Carpenter, R. De La Cruz, L. Bautista, D. Jimenez, C. Alvarez, S. Nabavi, S. Madonar, M. Pericas, P. Trancoso, M. Abduljabbar, J. Chen, P. N. Soomro, M Manivannan, M. Berge, S. Krupop, F. Klawonn, Al Mekhlafi, S. May, T. Becker, G. Gaydadjiev , et al. (20 additional authors not shown)

    Abstract: The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: 6 pages, 9 figures

  16. arXiv:1911.05114  [pdf, other

    cs.AR

    Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints

    Authors: Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, Per Stenstrom

    Abstract: An effective way to improve energy efficiency is to throttle hardware resources to meet a certain performance target, specified as a QoS constraint, associated with all applications running on a multicore system. Prior art has proposed resource management (RM) frameworks in which the share of the last-level cache (LLC) assigned to each processor and the voltage-frequency (VF) setting for each pr… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Submitted to the 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS2020)

  17. arXiv:1911.05101  [pdf, other

    cs.AR

    Coordinated Management of DVFS and Cache Partitioning under QoS Constraints to Save Energy in Multi-Core Systems

    Authors: Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, Per Stenstrom

    Abstract: Reducing the energy expended to carry out a computational task is important. In this work, we explore the prospects of meeting Quality-of-Service requirements of tasks on a multi-core system while adjusting resources to expend a minimum of energy. This paper considers, for the first time, a QoS-driven coordinated resource management algorithm (RMA) that dynamically adjusts the size of the per-core… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Submitted to the Journal of Parallel and Distributed Computing (Nov 2019)

  18. arXiv:1905.00673  [pdf, other

    cs.DC

    An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity

    Authors: **g Chen, Pirah Noor Soomro, Mustafa Abduljabbar, Miquel Pericàs

    Abstract: With the emergence of heterogeneous hardware paving the way for the post-Moore era, it is of high importance to adapt the runtime scheduling to the platform's heterogeneity. To enhance adaptive and responsive scheduling, we introduce a Performance Trace Table (PTT) into XiTAO, a framework for elastic scheduling of mixed-mode parallelism. The PTT is an extensible and dynamic lightweight manifest of… ▽ More

    Submitted 30 December, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

  19. arXiv:1901.05907  [pdf, other

    cs.DC

    High performance scheduling of mixed-mode DAGs on heterogeneous multicores

    Authors: Agnes Rohlin, Henrik Fahlgren, Miquel Pericas

    Abstract: Many HPC applications can be expressed as mixed-mode computations, in which each node of a computational DAG is itself a parallel computation that can be molded at runtime to allocate different amounts of processing resources. At the same time, modern HPC systems are becoming increasingly heterogeneous to address the requirements of energy efficiency. Effectively using heterogeneous devices is com… ▽ More

    Submitted 8 July, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

    Comments: Presented at HIP3ES, 2019. European Commission Project: LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681)

    Report number: HIP3ES/2019/1