Skip to main content

Showing 1–12 of 12 results for author: Goumas, G

.
  1. arXiv:2406.06900  [pdf, other

    cs.DC

    SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

    Authors: Christina Giannoula, Foteini Strati, Dimitrios Siakavaras, Georgios Goumas, Nectarios Koziris

    Abstract: Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several NUMA-oblivious implementations can scale up to a high number of threads, exploiting the potential parallelism of insert operation, NUMA-oblivious implementation… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2302.04225  [pdf

    cs.DC

    Feature-based SpMV Performance Analysis on Contemporary Devices

    Authors: Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, Nikela Papadopoulou, Nectarios Koziris, Georgios Goumas

    Abstract: The SpMV kernel is characterized by high performance variation per input matrix and computing platform. While GPUs were considered State-of-the-Art for SpMV, with the emergence of advanced multicore CPUs and low-power FPGA accelerators, we need to revisit its performance and energy efficiency. This paper provides a high-level SpMV performance analysis based on structural features of matrices relat… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: to appear at IPDPS'23

  3. arXiv:2301.09674  [pdf, other

    cs.AR cs.DC cs.PF

    Architectural Support for Efficient Data Movement in Disaggregated Systems

    Authors: Christina Giannoula, Kailong Huang, Jonathan Tang, Nectarios Koziris, Georgios Goumas, Zeshan Chishti, Nandita Vijaykumar

    Abstract: Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high p… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: To appear in the Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) 2023 and the ACM SIGMETRICS 2023 conference. arXiv admin note: text overlap with arXiv:2301.00414

  4. arXiv:2301.00414  [pdf, other

    cs.AR cs.DC cs.PF

    DaeMon: Architectural Support for Efficient Data Movement in Disaggregated Systems

    Authors: Christina Giannoula, Kailong Huang, Jonathan Tang, Nectarios Koziris, Georgios Goumas, Zeshan Chishti, Nandita Vijaykumar

    Abstract: Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high p… ▽ More

    Submitted 18 January, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

    Comments: To appear in the Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) 2023 and the ACM SIGMETRICS 2023 conference

  5. arXiv:2204.00900  [pdf, ps, other

    cs.AR cs.DC cs.PF

    Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

    Authors: Christina Giannoula, Ivan Fernandez, Juan Gómez-Luna, Nectarios Koziris, Georgios Goumas, Onur Mutlu

    Abstract: Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low me… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2201.05072

  6. arXiv:2201.05072  [pdf, other

    cs.AR cs.DC cs.PF

    SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

    Authors: Christina Giannoula, Ivan Fernandez, Juan Gómez-Luna, Nectarios Koziris, Georgios Goumas, Onur Mutlu

    Abstract: Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low me… ▽ More

    Submitted 23 May, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: To appear in the Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) 2022 and the ACM SIGMETRICS 2022 conference

  7. arXiv:2101.07557  [pdf, other

    cs.AR cs.DC

    SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

    Authors: Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou, Vasileios Karakostas, Ivan Fernandez, Juan Gómez-Luna, Lois Orosa, Nectarios Koziris, Georgios Goumas, Onur Mutlu

    Abstract: Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficien… ▽ More

    Submitted 13 February, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: To appear in the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27)

  8. arXiv:2006.02768  [pdf, other

    cs.LG stat.ML

    Weight Pruning via Adaptive Sparsity Loss

    Authors: George Retsinas, Athena Elafrou, Georgios Goumas, Petros Maragos

    Abstract: Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual lay… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  9. arXiv:1905.11910  [pdf, other

    cs.LG stat.ML

    RecNets: Channel-wise Recurrent Convolutional Neural Networks

    Authors: George Retsinas, Athena Elafrou, Georgios Goumas, Petros Maragos

    Abstract: In this paper, we introduce Channel-wise recurrent convolutional neural networks (RecNets), a family of novel, compact neural network architectures for computer vision tasks inspired by recurrent neural networks (RNNs). RecNets build upon Channel-wise recurrent convolutional (CRC) layers, a novel type of convolutional layer that splits the input channels into disjoint segments and processes them i… ▽ More

    Submitted 19 March, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

  10. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

    Authors: Athena Elafrou, Georgios Goumas, Nektarios Koziris

    Abstract: This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is both matrix- and architecture-adaptive through runtime specialization. To this direction, we pre… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: 10 pages, 7 figures, ICPP 2017

  11. arXiv:1601.07400  [pdf, other

    cs.DC

    Improving virtual host efficiency through resource and interference aware scheduling

    Authors: Evangelos Angelou, Konstantinos Kaffes, Athanasia Asiki, Georgios Goumas, Nectarios Koziris

    Abstract: Modern Infrastructure-as-a-Service Clouds operate in a competitive environment that caters to any user's requirements for computing resources. The sharing of the various types of resources by diverse applications poses a series of challenges in order to optimize resource utilization while avoiding performance degradation caused by application interference. In this paper, we present two scheduling… ▽ More

    Submitted 27 January, 2016; originally announced January 2016.

    Comments: 2nd International Workshop on Dynamic Resource Allocation and Management in Embedded, High Performance and Cloud Computing DREAMCloud 2016 (arXiv:cs/1601.04675)

    Report number: DREAMCloud/2016/02

  12. arXiv:1511.02494  [pdf, other

    cs.PF

    A lightweight optimization selection method for Sparse Matrix-Vector Multiplication

    Authors: Athena Elafrou, Georgios Goumas, Nectarios Koziris

    Abstract: In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every instance of the problem and then select an appropriate optimization to tackle it. Our first model requires online profiling of the input matrix in order to detect… ▽ More

    Submitted 10 January, 2016; v1 submitted 8 November, 2015; originally announced November 2015.

    Comments: 10 pages